This document is the summary of the Introduction to R workshop.

All correspondence related to this document should be addressed to:

Omid Ghasemi (Macquarie University, Sydney, NSW, 2109, AUSTRALIA)

Email:

1 Research Question

The aim of the study is to test if simple arguments are more effective in belief revision than more complex arguments. To that end, we present participants with an imaginary scenario (two alien creatures on a planet) and a theory (one creature is predator and the other one is prey) and we ask them to rate the likelihood truth of the theory based on a simple fact (We adapted this method from Gregg et al.,2017; see the original study here). Then, in a between-subject manipulation, participants will be presented with either 6 simple arguments (Modus Ponens conditionals) or 6 more complex arguments (Modus Tollens conditionals), and they will be asked to rate the likelihood truth of the initial theory on 7 stages.

The first stage is the base rating stage. The next three stages include supportive arguments of the theory and the last three arguments include disproving arguments of the theory. We hypothesized that the group with simple arguments shows better persuasion (as it reflects in higher ratings for the supportive arguments) and better dissuasion (as it reflects in lower ratings for the opposing arguments).

In the last part of the study, participants will be asked to answer several cognitive capacity/style measures including thinking style (CRT), open-mindedness (AOT-E), reasoning ability (mindware), and numeracy scales. We hypothesized that cognitive ability, cognitive style, and open-mindedness are positive predictors of persuasion and dissuasion. These associations should be more pronounced for participants in the group with complex arguments because the ability and willingness to engage in deliberative thinking may favor participants to assess the underlying logical structure of those arguments. However, for participants in the simple group, the logical structure of arguments is more evident, so participants with lower ability can still assess the logical status of those arguments.

Thus, our hypotheses for this experiment are as follows:

  • Participants in the group with simple arguments have higher ratings for supportive arguments (They are more easily persuaded than those in the group with complex arguments).

  • Participants in the group with simple arguments have lower ratings for opposing arguments (They are more easily dissuaded than those in the group with complex arguments).

  • There are significant associations between thinking style (CRT), open-mindedness (AOT-E), reasoning ability (mindware), and numeracy scales with both persuasion and dissuasion indexes in each group and in the entire sample. The relationship between these measures should be stronger, although not significantly, for participants in the group with complex arguments.

2 Getting Ready

First, we need to design the experiment. For this experiment, we use online platforms for data collection. There are several options such as Gorilla, JSpsych, Qualtrics, psychoJS (pavlovia), etc. Since we do not need any reaction time data, we simply use Qualtrics. For an overview of different lab-based and online platforms, see here.

Next, we need to decide on the number of participants (sample size). For this study, we do not sue power analysis since we cannot access more than 120 participants. However, it is highly suggested calculate sample size using power estimation. You can find some nice tutorials on how to do that here, here, and here.

After we created the experiment and decided on the sample size, the next step is to preresigter the study. However, it would be better to do a pilot with 4 or 5 participants, clean all the data, do the desired analysis, and then pre-register the analysis and those codes. You can find the preregistration form for the current study here.

Finally, we need to restructure our project in a tidy folder with different sub-folders. Having a clean and tidy folder structure can save us! There are different formats of folder structure (for example, see here and here), but for now, we use the following structure:

3 Introduction to R

# load libraries
library(tidyverse)
library(here)
library(janitor)
library(broom)
library(afex)
library(emmeans)
library(knitr)
library(kableExtra)
library(ggsci)
library(patchwork)
library(skimr)
# install.packages("devtools")
# devtools::install_github("easystats/correlation")
library("correlation")
options(scipen=999) # turn off scientific notations
options(contrasts = c('contr.sum','contr.poly')) # set the contrast sum globally 
options(knitr.kable.NA = '')

R can be used as a calculator. For mathematical purposes, be careful of the order in which R executes the commands.

10 + 10
## [1] 20
4 ^ 2
## [1] 16
(250 / 500) * 100
## [1] 50

R is a bit flexible with spacing (but no spacing in the name of variables and words)

10+10
## [1] 20
10                 +           10
## [1] 20

R can sometimes tell that you’re not finished yet

10 +

How to create a variable? Variable assignment using <- and =. Note that R is case sensitive for everything

pay <- 250

month = 12

pay * month
## [1] 3000
salary <- pay * month

Few points in naming variables and vectors: use short, informative words, keep same method (e.g., not using capital words, use only _ or . ).

3.1 Function

Function is a set of statements combined together to perform a specific task. When we use a block of code repeatedly, we can convert it to a function. To write a function, first, you need to define it:

my_multiplier <- function(a,b){
  result = a * b
  return (result)
}

This code do nothing. To get a result, you need to call it:

my_multiplier (a=2, b=4)
## [1] 8
# or: my_multiplier (2, 4)

We can set a default value for our arguments:

my_multiplier2 <- function(a,b=4){
  result = a * b
  return (result)
}

my_multiplier2 (a=2)
## [1] 8
# or: my_multiplier (2)
# or: my_multiplier (2, 6)

Fortunately, you do not need to write everything from scratch. R has lots of built-in functions that you can use:

round(54.6787)
## [1] 55
round(54.5787, digits = 2)
## [1] 54.58

Use ? before the function name to get some help. For example, ?round. You will see many functions in the rest of the workshop.

3.2 Basic Data Types in R:

function class() is used to show what is the type of a variable.

  1. Logical: TRUE, FALSE can be abbreviated as T, F. They has to be capital, ‘true’ is not a logical data:
class(TRUE)
## [1] "logical"
class(F)
## [1] "logical"
  1. Numeric: all numbers e.g. 5, 10.5, 11,37; a special type of numeric is “integer” which is numbers without decimal. Integers are always numeric, but numeric is not always integer:
class(2)
## [1] "numeric"
class(13.46)
## [1] "numeric"
  1. Character: text for example, “I love R” or “4” or “4.5”:
class("ha ha ha ha")
## [1] "character"
class("56.6")
## [1] "character"
class("TRUE")
## [1] "character"

Can we change the type of data in a variable? Yes, you need to use the function as.---()

as.numeric(TRUE)
## [1] 1
as.character(4)
## [1] "4"
as.numeric("4.5")
## [1] 4.5
as.numeric("Hello")
## Warning: NAs introduced by coercion
## [1] NA

3.3 Data Structures in R

Vector: when there are more than one number or letter stored. Use the combine function c() for that.

sale <- c(1, 2, 3,4, 5, 6, 7, 8, 9, 10) # also sale <- c(1:10)

sale <- c(1:10)

sale * sale
##  [1]   1   4   9  16  25  36  49  64  81 100

Subsetting a vector:

days <- c("Saturday", "Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday")

days[2]
## [1] "Sunday"
days[-2]
## [1] "Saturday"  "Monday"    "Tuesday"   "Wednesday" "Thursday"  "Friday"
days[c(2, 3, 4)]
## [1] "Sunday"  "Monday"  "Tuesday"

3.3.1 Exercise

Create a vector named my_vector with numbers from 0 to 1000 in it:

my_vector <- (0:1000)

mean(my_vector)
## [1] 500
median(my_vector)
## [1] 500
min(my_vector)
## [1] 0
range(my_vector)
## [1]    0 1000
class(my_vector)
## [1] "integer"
sum(my_vector)
## [1] 500500
sd(my_vector)
## [1] 289.1081

List: allows you to gather a variety of objects under one name (that is, the name of the list) in an ordered way. These objects can be matrices, vectors, data frames, even other list.

my_list = list(sale, 1, 3, 4:7, "HELLO", "hello", FALSE)
my_list
## [[1]]
##  [1]  1  2  3  4  5  6  7  8  9 10
## 
## [[2]]
## [1] 1
## 
## [[3]]
## [1] 3
## 
## [[4]]
## [1] 4 5 6 7
## 
## [[5]]
## [1] "HELLO"
## 
## [[6]]
## [1] "hello"
## 
## [[7]]
## [1] FALSE

Factor: Factors store the vector along with the distinct values of the elements in the vector as labels. The labels are always character irrespective of whether it is numeric or character. For example, variable gender with “male” and “female” entries:

gender <- c("male", "male", "male", " female", "female", "female")
gender <- factor(gender)

R now treats gender as a nominal (categorical) variable: 1=female, 2=male internally (alphabetically).

summary(gender)
##  female  female    male 
##       1       2       3

Question: why when we ran the above function i.e. summary(), it showed three and not two levels of the data? Hint: run ‘gender’.

gender
## [1] male    male    male     female female  female 
## Levels:  female female male

So, be careful of spaces!

3.3.2 Exercise

Create a gender factor with 30 male and 40 females (Hint: use the rep() function):

gender <- c(rep("male",30), rep("female", 40))
gender <- factor(gender)
gender
##  [1] male   male   male   male   male   male   male   male   male   male  
## [11] male   male   male   male   male   male   male   male   male   male  
## [21] male   male   male   male   male   male   male   male   male   male  
## [31] female female female female female female female female female female
## [41] female female female female female female female female female female
## [51] female female female female female female female female female female
## [61] female female female female female female female female female female
## Levels: female male

There are two types of categorical variables: nominal and ordinal. How to create ordered factors (when the variable is nominal and values can be ordered)? We should add two additional arguments to the factor() function: ordered = TRUE, and levels = c("level1", "level2"). For example, we have a vector that shows participants’ education level.

edu<-c(3,2,3,4,1,2,2,3,4)

education<-factor(edu, ordered = TRUE)
levels(education) <- c("Primary school","high school","College","Uni graduated")
education
## [1] College        high school    College        Uni graduated 
## [5] Primary school high school    high school    College       
## [9] Uni graduated 
## Levels: Primary school < high school < College < Uni graduated

3.3.3 Exercise

We have a factor with patient and control values. Here, the first level is control and the second level is patient. Change the order of levels, so patient would be the first level:

health_status <- factor(c(rep('patient',5),rep('control',5)))
health_status
##  [1] patient patient patient patient patient control control control
##  [9] control control
## Levels: control patient
health_status_reordered <- factor(health_status, levels = c('patient','control'))
health_status_reordered
##  [1] patient patient patient patient patient control control control
##  [9] control control
## Levels: patient control

Finally, can you relabel both levels to uppercase characters? (Hint: check ?factor)

health_status_relabeled <- factor(health_status, levels = c('patient','control'), labels = c('Patient','Control'))
health_status_relabeled
##  [1] Patient Patient Patient Patient Patient Control Control Control
##  [9] Control Control
## Levels: Patient Control

Matrices: All columns in a matrix must have the same mode(numeric, character, etc.) and the same length. It can be created using a vector input to the matrix function.

my_matrix = matrix(c(1,2,3,4,5,6,7,8,9), nrow = 3, ncol = 3)

my_matrix
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9

Data frames: (two-dimensional objects) can hold numeric, character or logical values. Within a column all elements have the same data type, but different columns can be of different data type. Let’s create a dataframe:

id <- 1:200
group <- c(rep("Psychotherapy", 100), rep("Medication", 100))
response <- c(rnorm(100, mean = 30, sd = 5),
             rnorm(100, mean = 25, sd = 5))

my_dataframe <-data.frame(Patient = id,
                          Treatment = group,
                          Response = response)

We also could have done the below

my_dataframe <-data.frame(Patient = c(1:200),
                          Treatment = c(rep("Psychotherapy", 100), rep("Medication", 100)),
                          Response = c(rnorm(100, mean = 30, sd = 5),
                                       rnorm(100, mean = 25, sd = 5)))

In large data sets, the function head() enables you to show the first observations of a data frames. Similarly, the function tail() prints out the last observations in your data set.

head(my_dataframe) 
tail(my_dataframe)
Patient Treatment Response
1 Psychotherapy 31.46027
2 Psychotherapy 25.80691
3 Psychotherapy 23.90582
4 Psychotherapy 35.24277
5 Psychotherapy 36.12894
6 Psychotherapy 28.88709
Patient Treatment Response
195 195 Medication 29.98481
196 196 Medication 17.56606
197 197 Medication 27.15201
198 198 Medication 25.47396
199 199 Medication 25.41576
200 200 Medication 26.85049

Similar to vectors and matrices, brackets [] are used to selects data from rows and columns in data.frames:

my_dataframe[35, 3]
## [1] 24.10207

3.3.4 Exercise

How can we get all columns, but only for the first 10 participants?

my_dataframe[1:10, ]
Patient Treatment Response
1 Psychotherapy 31.46027
2 Psychotherapy 25.80691
3 Psychotherapy 23.90582
4 Psychotherapy 35.24277
5 Psychotherapy 36.12894
6 Psychotherapy 28.88709
7 Psychotherapy 28.13632
8 Psychotherapy 26.74151
9 Psychotherapy 28.07995
10 Psychotherapy 25.66144

How to get only the Response column for all participants?

my_dataframe[ , 3]
##   [1] 31.46027 25.80691 23.90582 35.24277 36.12894 28.88709 28.13632
##   [8] 26.74151 28.07995 25.66144 32.43043 32.90031 28.22010 20.49055
##  [15] 21.60086 29.88247 28.37732 28.33941 32.99115 35.60892 28.20174
##  [22] 31.02228 37.67104 26.73629 23.17044 29.62874 27.47495 29.36538
##  [29] 36.66284 37.21628 33.22692 28.58497 29.40086 30.05996 24.10207
##  [36] 37.68405 34.11073 35.26914 35.77968 23.37928 29.54027 26.18844
##  [43] 38.75336 29.88913 35.19795 32.72946 29.27546 31.04248 33.29102
##  [50] 18.01849 29.91320 30.06106 30.93769 29.09398 31.92382 31.01322
##  [57] 26.27603 28.00506 29.75021 32.14493 24.69866 31.15726 22.59687
##  [64] 33.51499 20.87766 24.72480 28.88959 23.23752 22.79437 25.97967
##  [71] 40.90827 30.38728 38.08255 37.16809 25.73616 25.32099 28.77482
##  [78] 34.21075 33.13974 29.55064 23.93372 30.95174 28.30632 37.92416
##  [85] 27.14505 19.64008 34.20583 44.02146 29.18339 35.25485 39.78779
##  [92] 32.02400 33.92133 31.74780 22.73999 31.02149 24.51421 28.80851
##  [99] 37.00935 39.01414 24.46241 33.60924 26.14423 19.96034 24.18702
## [106] 25.42803 26.15863 29.76837 25.03232 25.53612 26.98493 23.44438
## [113] 14.90449 28.02894 24.57431 22.30127 34.10020 20.91610 33.11796
## [120] 37.12811 22.89906 31.94684 27.52520 22.31867 13.92906 19.25865
## [127] 30.37195 24.67580 26.37599 24.65865 28.30528 23.66096 17.49513
## [134] 23.04172 15.84645 18.23439 26.63121 29.81582 32.55382 25.66231
## [141] 28.64869 23.77102 32.00628 25.16474 32.48781 25.98615 22.57022
## [148] 19.36977 28.48540 26.29446 21.81714 24.45360 17.95169 23.09270
## [155] 21.38275 27.08464 18.66920 27.60698 27.70646 24.17864 22.87044
## [162] 24.70419 19.10310 21.52611 25.94191 26.26336 27.21419 26.85767
## [169] 17.14670 24.41177 31.19198 24.97077 24.81557 21.71635 26.72816
## [176] 24.91654 28.13910 21.55515 19.29760 26.87387 23.21848 26.19222
## [183] 25.00059 33.82250 27.32508 35.48514 32.61955 32.35353 10.76799
## [190] 15.55515 23.96873 23.10777 26.09092 21.18117 29.98481 17.56606
## [197] 27.15201 25.47396 25.41576 26.85049

Another easier way for selecting particular items is using their names that is more helpful than number of the rows in large data sets:

my_dataframe[ , "Response"]
# OR:
my_dataframe$Response

4 Data Cleaning

Now, suppose we tested 141 students. First, let’s read and check the uncleaned data:

# read the raw data
raw_data <- read_csv(here("raw_data","raw_data_exp1.csv"))
head(raw_data)
end_date status ip_address progress duration_in_seconds subject recorded_date response_id location_latitude location_longitude distribution_channel user_language consent_form age gender stage1_simple stage2_simple stage3_simple stage4_simple stage5_simple stage6_simple stage7_simple stage1_complex stage2_complex stage3_complex stage4_complex stage5_complex stage6_complex stage7_complex thinking1 thinking2 thinking3 openminded1 openminded2 openminded3 openminded4 openminded5 openminded6 openminded7 openminded8 group numeracy_total reasoning_total
24/9/20 22:02 IP Address 202.7.193.64 100 1517 subj1 24/9/20 22:02 R_1f298znjmVzcOjp -33.85910 151.2002 anonymous EN I consent 18 Female 36 70 68 54 51 43 41 8 50 20 5 5 5 5 5 6 5 3 Complex 9 9
25/9/20 3:23 IP Address 220.245.220.94 100 1131 subj2 25/9/20 3:23 R_tL0A9P33Gi18I0N -34.03680 150.6672 anonymous EN I consent 18 Male 50 55 55 90 75 50 35 8 10 39 6 6 6 5 6 6 5 6 Simple 9 10
27/9/20 22:59 IP Address 121.210.0.211 100 709 subj3 27/9/20 22:59 R_1LNyJhCKxTAAMOW -33.85910 151.2002 anonymous EN I consent 19 Female 50 50 77 60 25 20 13 8 50 20 6 5 4 5 5 6 5 6 Simple 10 8
27/9/20 23:18 IP Address 58.179.100.109 100 949 subj4 27/9/20 23:18 R_3enxzUsEYgs5r1a -12.63921 141.8741 anonymous EN I consent 27 Female 70 80 90 95 70 80 90 8 50 20 6 6 6 1 6 6 6 1 Complex 8 7
28/9/20 0:45 IP Address 120.154.53.68 100 1097 subj5 28/9/20 0:45 R_2Qzl2096a4KNE29 -33.85910 151.2002 anonymous EN I consent 19 Male 71 73 85 95 95 32 32 4 10 39 6 6 5 5 6 6 6 6 Simple 11 11
28/9/20 2:20 IP Address 1.129.107.6 100 880 subj6 28/9/20 2:20 R_esb71WOTQySjusF -33.85910 151.2002 anonymous EN I consent 20 Female 89 100 44 55 100 50 55 8 50 20 6 6 6 5 6 6 6 6 Complex 10 10

4.0.1 Exercise

There is a dataset in the cleaned_data folder named unicef_u5mr.csv. Read the dataset using read_csv and here.

unicef_data <- read_csv(here("cleaned_data","unicef_u5mr.csv"))

In order to clean the data, we use tidyverse which is a collection of packages to work with data. One of the tidyverse packages that we use regularly is dplyr which includes several functions:

  • mutate() adds new variables or change existing ones.
  • select() pick variables (columns) based on their names.
  • filter() picks cases (rows) based on their values.
  • summarise() gives a single single summary of the data (e.g., mean, counts, etc.)
  • arrange() changes the ordering of the rows.
  • group_by() divides your dataframe into grouped dataframes and allow you to do each of the above operations (except for arrange) on every one of them separately.

Pick subject, age, and gender columns:

selected_data <- select(raw_data, subject, age, gender)

Now, do the following tasks: pick all the male participants, pick all the male participants or those greater than 25 years old, and finally pick all male participants and those greater than 25 years old:

# filter all males
filtered_data_male <- filter(raw_data, gender == "Male")
# filter males and older than 25
filtered_data_male_and_greater25 <- filter(raw_data, gender == "Male" & age > 25 )
# filter males or older than 25
filtered_data_male_or_greater25 <- filter(raw_data, gender == "Male" | age > 25 )

Arrange (order) your dataframe based on the age, once in an ascending order (youngers first) and once based on descending order (olders first):

# order participants based on their age
arranged_data <- arrange(raw_data, age)
# order participants based on their age (descendeing)
arranged_data_descending <- arrange(raw_data, desc(age))

Create a column to show if the participant has finished the task or not:

mutated_data <- mutate (raw_data, finished= case_when(progress==100~ "Yes",T~ "No"))

Summarize participants age and sd:

summarise(raw_data, mean= mean(age, na.rm=T),
          sd= sd (age, na.rm=T))
mean sd
21.27273 6.635655

A new function: pipe operators %>% pipes a value into the next function:

raw_data %>% 
  summarise(., mean= mean(age, na.rm=T),
            sd= sd (age, na.rm=T))
mean sd
21.27273 6.635655
raw_data %>% 
  summarise(mean= mean(age, na.rm=T),
            sd= sd (age, na.rm=T))
mean sd
21.27273 6.635655

Calculate the age mean of younger than 25 participants only:

raw_data %>% 
  filter (age < 25) %>%
  summarise(mean= mean(age, na.rm=T),
            sd= sd (age, na.rm=T))
mean sd
19.1913 1.515393

Calculate the age mean of younger than 25 participants for each gender separately:

raw_data %>% 
  filter (age < 25) %>%
  group_by(gender) %>%
  summarise(mean= mean(age, na.rm=T),
            sd= sd (age, na.rm=T)) %>%
  ungroup ()
gender mean sd
Female 19.21053 1.556693
Male 19.10000 1.333772

4.0.2 Exercise

  1. Create a column to show if participant is older than 23 or not and then calculate reasoning ability (reasoning_total) mean for each group separately:
raw_data %>%
  mutate(age_group = case_when(age > 23 ~ "greater than 23", T~ "younger than 23")) %>%
  group_by(age_group) %>%
  summarise(reasoning_total = mean(reasoning_total, na.rm=T))
age_group reasoning_total
greater than 23 8.800000
younger than 23 7.842975
  1. Add the open_mindedness total score (sum) to the dataframe and then convert subject column to factor:
mutated_openmind_data <- raw_data %>%
  mutate(openminded_total= openminded1+openminded2+openminded3+openminded4+openminded5+openminded6+openminded7+openminded8) %>%
  mutate(subject= factor(subject))

Next, we want to pivot our data to switch between long and wide format:

# pivoting your data


# Make you data long
long_data <- raw_data %>%
  select(subject, stage1_simple:stage7_simple,stage1_complex:stage7_complex) %>%
  pivot_longer(cols = c(stage1_simple:stage7_complex), names_to = 'stage', values_to = 'truth_estimate')


# Make you data wide
wide_data <- long_data %>%
  pivot_wider(names_from = stage, values_from= truth_estimate)

4.0.3 Exercise

Convert the UNICEF dataset to long and wide formats:

unicef_data <- read_csv(here("cleaned_data","unicef_u5mr.csv"))

library(janitor)
unicef_data_cleaned <- unicef_data %>%
  clean_names()

unicef_long_data <- unicef_data_cleaned %>% pivot_longer(cols = c(u5mr_1950:u5mr_2015), names_to = 'year', values_to = 'u5mr')
unicef_wideg_data <- unicef_long_data %>% pivot_wider(names_from = 'year', values_from = 'u5mr')

Note: The codes for the previous exercise were taken from this blog post written by Simon Ejdemyr.

Now, let’s do some cleaning using dplyr, tidyr and other tidyverse libraries:

cleaned_data <- raw_data %>% 
  filter(progress == 100) %>% # filter out unfinished participants
  select(-end_date, -status,-ip_address, -duration_in_seconds, -recorded_date:-user_language) %>% #remove some useless columns
  mutate(openminded_total= openminded1+openminded2+openminded3+openminded4+openminded5+openminded6+openminded7+openminded8) %>%# create a total score for our questionnaire
  mutate(thinking1= case_when(thinking1==4~ 1,T~0),
         thinking2= case_when(thinking2==10~ 1,T~0),
         thinking3= case_when(thinking3==39~ 1,T~0),
         thinking_total= thinking1 + thinking2 + thinking3) %>%
  select(-thinking1:-openminded8) %>%
  pivot_longer(cols = c(stage1_simple:stage7_simple,stage1_complex:stage7_complex),names_to = 'stage',values_to = 'truth_estimate') %>% # make our dataframe long
  #pivot_wider(names_from = stage, values_from= truth_estimate) # this code change our dataframe back to wide
  filter(!is.na(truth_estimate)) %>% #remove rows with truth_estimate == NA
  mutate(stage= gsub("_.*", "", stage)) %>%
  rename(consent= consent_form) # rename a column
progress subject consent age gender group numeracy_total reasoning_total openminded_total thinking_total stage truth_estimate
100 subj1 I consent 18 Female Complex 9 9 39 0 stage1 36
100 subj1 I consent 18 Female Complex 9 9 39 0 stage2 70
100 subj1 I consent 18 Female Complex 9 9 39 0 stage3 68
100 subj1 I consent 18 Female Complex 9 9 39 0 stage4 54
100 subj1 I consent 18 Female Complex 9 9 39 0 stage5 51
100 subj1 I consent 18 Female Complex 9 9 39 0 stage6 43

Ok, now the data is clean and tidy which means:

  1. Each variable forms a column.
  2. Each observation forms a row.
  3. Each type of observational unit forms a table (Wickham, 2014).

Check the dataframe and all the data types:

str(cleaned_data)
## tibble [917 × 12] (S3: tbl_df/tbl/data.frame)
##  $ progress        : num [1:917] 100 100 100 100 100 100 100 100 100 100 ...
##  $ subject         : chr [1:917] "subj1" "subj1" "subj1" "subj1" ...
##  $ consent         : chr [1:917] "I consent" "I consent" "I consent" "I consent" ...
##  $ age             : num [1:917] 18 18 18 18 18 18 18 18 18 18 ...
##  $ gender          : chr [1:917] "Female" "Female" "Female" "Female" ...
##  $ group           : chr [1:917] "Complex" "Complex" "Complex" "Complex" ...
##  $ numeracy_total  : num [1:917] 9 9 9 9 9 9 9 9 9 9 ...
##  $ reasoning_total : num [1:917] 9 9 9 9 9 9 9 10 10 10 ...
##  $ openminded_total: num [1:917] 39 39 39 39 39 39 39 46 46 46 ...
##  $ thinking_total  : num [1:917] 0 0 0 0 0 0 0 2 2 2 ...
##  $ stage           : chr [1:917] "stage1" "stage2" "stage3" "stage4" ...
##  $ truth_estimate  : num [1:917] 36 70 68 54 51 43 41 50 55 55 ...

Finally, we save our data to the cleaned_data folder.

write_csv(cleaned_data, here("cleaned_data","cleaned_data_exp1.csv"))

5 Descriptive Statistics

Note: All the data that we use here is manipulated (fabricated) for teaching purpuses. In our study, we failed to find such beautiful and interesting results.

Now, let’s do some descriptive statistics. Now, we can open a new script called data_analysis.r and read some datasets. Then we use skimr package to describe our data.

narcissism_data <- read_csv(here("cleaned_data","narcissism_data.csv"))
narcissism_data %>% skimr::skim()
Data summary
Name Piped data
Number of rows 131
Number of columns 5
_______________________
Column type frequency:
character 1
numeric 4
________________________
Group variables

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
subject 0 1 5 7 0 131 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
psychopathy 0 1 8.78 2.27 0 8.0 10 10 11 ▁▁▁▂▇
self_esteem 0 1 8.45 1.68 4 8.0 8 9 12 ▁▅▇▆▃
narcissism 0 1 38.20 6.15 19 33.5 39 43 48 ▁▂▇▇▆
mental_health 0 1 3.19 1.04 1 3.0 4 4 4 ▂▂▁▃▇

5.0.1 Exercise

  1. Open the dataset called treatment_data.csv in the cleaned_data folder and do a descriptive analysis:
treatment_data <- read_csv(here("cleaned_data","treatment_data.csv"))
treatment_data %>% skimr::skim()
Data summary
Name Piped data
Number of rows 131
Number of columns 7
_______________________
Column type frequency:
character 3
numeric 4
________________________
Group variables

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
subject 0 1 5 7 0 131 0
gender 0 1 4 6 0 2 0
treatment 0 1 3 13 0 2 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
age 0 1 21.15 6.52 16 18.0 19 20.0 63 ▇▁▁▁▁
anxiety 0 1 62.35 24.51 0 40.0 69 81.0 100 ▂▆▃▇▆
depression 0 1 52.50 22.12 0 34.5 51 71.0 100 ▂▇▇▆▃
life_satisfaction 0 1 41.02 23.93 0 21.0 39 56.5 100 ▅▇▅▃▂
  1. Do the same thing for the memory_data.csv.
memory_data <- read_csv(here("cleaned_data","memory_data.csv"))
memory_data %>% group_by(time) %>%
  skimr::skim()
Data summary
Name Piped data
Number of rows 262
Number of columns 5
_______________________
Column type frequency:
character 2
numeric 2
________________________
Group variables time

Variable type: character

skim_variable time n_missing complete_rate min max empty n_unique whitespace
subject post_test_memory 0 1 5 7 0 131 0
subject pre_test_memory 0 1 5 7 0 131 0
gender post_test_memory 0 1 4 6 0 2 0
gender pre_test_memory 0 1 4 6 0 2 0

Variable type: numeric

skim_variable time n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
age post_test_memory 0 1 21.15 6.52 16 18.0 19 20.0 63 ▇▁▁▁▁
age pre_test_memory 0 1 21.15 6.52 16 18.0 19 20.0 63 ▇▁▁▁▁
memory_score post_test_memory 0 1 52.50 22.12 0 34.5 51 71.0 100 ▂▇▇▆▃
memory_score pre_test_memory 0 1 41.02 23.93 0 21.0 39 56.5 100 ▅▇▅▃▂

Now, let’s describe our experiment data. Be careful, we need some data reshaping before description:

data_exp1_orig <- read_csv(here("cleaned_data","cleaned_data_exp1.csv"))


data_exp1 <- data_exp1_orig%>% 
  #mutate_if(is.character, factor) %>%
  mutate(subject= factor(subject), # convert all characters to factor
         group = factor(group),
         stage = factor(stage))

How many participants in total?

data_exp1 %>% summarise(n= n_distinct(subject))
n
131

how many participants in each group?

data_exp1 %>% 
  group_by(subject) %>% 
  filter(row_number()==1) %>% 
  ungroup () %>% 
  group_by(group) %>% 
  count() 
group n
Complex 65
Simple 66

Find the mean and sd for numeric variables using base R summary function:

data_exp1 %>% 
  group_by(subject) %>% 
  filter(row_number()==1) %>% 
  ungroup () %>%
  summary()
##     progress      subject      consent               age       
##  Min.   :100   subj1  :  1   Length:131         Min.   :16.00  
##  1st Qu.:100   subj10 :  1   Class :character   1st Qu.:18.00  
##  Median :100   subj101:  1   Mode  :character   Median :19.00  
##  Mean   :100   subj102:  1                      Mean   :21.15  
##  3rd Qu.:100   subj103:  1                      3rd Qu.:20.00  
##  Max.   :100   subj104:  1                      Max.   :63.00  
##                (Other):125                                     
##     gender              group    numeracy_total   reasoning_total
##  Length:131         Complex:65   Min.   : 0.000   Min.   : 4.00  
##  Class :character   Simple :66   1st Qu.: 8.000   1st Qu.: 8.00  
##  Mode  :character                Median :10.000   Median : 8.00  
##                                  Mean   : 8.779   Mean   : 8.45  
##                                  3rd Qu.:10.000   3rd Qu.: 9.00  
##                                  Max.   :11.000   Max.   :12.00  
##                                                                  
##  openminded_total thinking_total      stage     truth_estimate  
##  Min.   :19.0     Min.   :0.0000   stage1:131   Min.   :  0.00  
##  1st Qu.:33.5     1st Qu.:0.0000   stage2:  0   1st Qu.: 43.50  
##  Median :39.0     Median :0.0000   stage3:  0   Median : 61.00  
##  Mean   :38.2     Mean   :0.8092   stage4:  0   Mean   : 57.09  
##  3rd Qu.:43.0     3rd Qu.:1.0000   stage5:  0   3rd Qu.: 75.50  
##  Max.   :48.0     Max.   :3.0000   stage6:  0   Max.   :100.00  
##                                    stage7:  0

Alternatively, we can use base Rsummaryfunctionskimr` library:

data_exp1 %>% 
  group_by(subject) %>% 
  filter(row_number()==1) %>% 
  ungroup () %>% 
  dplyr::select (age, numeracy_total, reasoning_total, openminded_total, thinking_total) %>% 
  skimr::skim()
skim_type skim_variable n_missing complete_rate numeric.mean numeric.sd numeric.p0 numeric.p25 numeric.p50 numeric.p75 numeric.p100 numeric.hist
numeric age 0 1 21.1526718 6.515630 16 18.0 19 20 63 ▇▁▁▁▁
numeric numeracy_total 0 1 8.7786260 2.274576 0 8.0 10 10 11 ▁▁▁▂▇
numeric reasoning_total 0 1 8.4503817 1.683466 4 8.0 8 9 12 ▁▅▇▆▃
numeric openminded_total 0 1 38.1984733 6.153698 19 33.5 39 43 48 ▁▂▇▇▆
numeric thinking_total 0 1 0.8091603 1.038598 0 0.0 0 1 3 ▇▃▁▂▂

5.0.2 Exercise

For this exercise, we use a dataset of one of my own studies. In this study, we asked participants to guess the physical brightness of reasoning arguments and then we gave a cognitive ability test. (See the original study here). Open ghasemi_brightness_exp4.csv file and answer to the following questions:

  1. How many participants did we test in total?
  2. Find out how many male and female we tested.
  3. Calculate mean and sd for age and cognitive ability (cog_ability).
ghasemi_data <- read_csv(here("cleaned_data","ghasemi_brightness_exp4.csv"))

ghasemi_data %>% summarise(n = n_distinct(participant)) # number of participants:200
n
200
ghasemi_data %>% group_by (participant) %>% filter (row_number()==1) %>% group_by (gender) %>% summarise(n= n()) %>% ungroup() # 183 female, 17 male
gender n
Female 183
Male 17
ghasemi_data %>% dplyr::select (age, cog_ability) %>% skimr::skim() # mean and sd for age and cognitive ability
Data summary
Name Piped data
Number of rows 38400
Number of columns 2
_______________________
Column type frequency:
numeric 2
________________________
Group variables

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
age 0 1 22.20 6.78 17 19 20 22 52 ▇▁▁▁▁
cog_ability 0 1 39.55 9.46 11 34 40 46 61 ▁▃▇▆▂

6 Data Visualization

Before starting the ggplot, lets try a visualization using a function from the Base R the plot() function shows the association of each variable against the other one in a data handy for data with few number of variables to see if there are any patterns

exam_data<- read_csv(here::here("cleaned_data", "exam_data.csv"))

plot(x = exam_data$Anxiety, y = exam_data$Exam)

The code also works without writing x and y, however, writing them is strongly recommended

plot(exam_data$Anxiety, exam_data$Exam)

ggplot, the gg in ggplot stands for grammar of graphics. Grammar of graphics basically says any graphical representation of data, can be produced by a series of layers. You can think of a layer as a plastic transparency. Lets draw the same plot using ggplot. Always, mention the data we are going to work with.

ggplot(data = exam_data, aes(x = Exam, y = Anxiety))

  • aes: aes which stands for aesthetics is a relationship between a variable in your dataset and an aspect of the plot that is going to visually convey the information to the reader

  • Visual elements are known as geoms (short for ‘geometric objects’) in ggplot 2. When we define a layer, we have to tell R what geom we want displayed on that layer (do we want a bar, line dot, etc.?)

ggplot(data = exam_data, aes(x = Exam, y = Anxiety))+ geom_point()

So, lets try some of them here like shape and size. Be careful with the + sign, if you clink enter for the next part of the code, the + sign should not go to the next line

ggplot(data = exam_data, aes(x = Exam, y = Anxiety))+
  geom_point(size = 2, shape = 8)

The current plot is not very informative about the patterns for each gender.

ggplot(data = exam_data, aes(x = Exam, y = Anxiety, color = Gender))+
  geom_point(size = 2, shape = 10)

ggplot(data = exam_data, aes(x = Exam, y = Anxiety, color = Gender, shape = Gender))+
  geom_point(size = 2, shape = 10)

Question: why the above code doesn’t make any change?

ggplot(data = exam_data, aes(x = Exam, y = Anxiety, color = Gender, shape = Gender))+
  geom_point(size = 2)

Can assign the first layer to a variable to reduce the length of codes for next layers.

My_graph <- ggplot(data = exam_data, aes(x = Exam, y = Anxiety))

My_graph + geom_point()

lets add a line to the current graph

My_graph + geom_point() + geom_smooth()

Aesthetics can be set for all layers of the plot (i.e., defined in the plot as a whole) or can be set individually for each geom in a plot.

My_graph + geom_point(aes(color = Gender)) + geom_smooth()

My_graph + geom_point(aes(color = Gender)) + geom_smooth(aes(color = Gender))

The shaded area around the line is the 95% confidence interval around the line. We can switch this off by adding se = F (which is short for ‘standard error = False’)

My_graph + geom_point() + geom_smooth(se = F)

What if we want our line to be a direct line?

My_graph + geom_point() + geom_smooth(se = F, method = lm)

How to change the labels of x and y axes?

My_graph + geom_point() + geom_smooth(se = F, method = lm) +
  labs(x = "Exam scores %", y = "Anxiety scores")

Histograms are used to show distributions of variables while bar charts are used to compare variables. Histograms plot quantitative data with ranges of the data grouped into bins or intervals while bar charts plot categorical data.

#ggplot(data = exam_data, aes(x = Anxiety, y = Exam )) + geom_histogram()
# the code above gives an error as geom_histogram can only have x or y axis in its aes()

ggplot(data = exam_data, aes(x = Anxiety)) + geom_histogram()

ggplot(data = exam_data, aes(y = Anxiety)) + geom_histogram()

ggplot(data = exam_data, aes(x = Anxiety)) + geom_histogram(bins = 31)

ggplot(data = exam_data, aes(x = Anxiety)) + geom_histogram(bins = 31, fill = "green")

ggplot(data = exam_data, aes(x = Anxiety)) + geom_histogram(bins = 31, fill = "green", col = "red")

Let’s stop using the My_graph variable and write the whole code from the start again for a bar chart

ggplot(data = exam_data, aes(x = Sleep_quality))+
  geom_bar()

Because we want to plot a summary of the data (the mean) rather than the raw scores themselves, we have to use a stat.

ggplot(data = exam_data, aes(x = Sleep_quality, y = Exam, fill = Gender))+
  geom_bar(stat = "summary", fun = "mean")

ggplot(data = exam_data, aes(x = Sleep_quality, y = Exam, fill = Gender))+
  geom_bar(stat = "summary", fun = "mean", position = "dodge")

The other way to get the same plot that the code above gives, is using the stat_summary function that takes the following general form: stat_summary(function = x, geom = y)

ggplot(data = exam_data, aes(x = Sleep_quality, y = Exam, fill = Gender))+
  stat_summary(fun = mean, geom = "bar", position = "dodge")

How to combine multiple plots? How to combine multiple plots? We can use the patchwork package. A nice tutorial on using this package can be found here

p1 = My_graph + geom_point(aes(color = Gender)) + geom_smooth()

p2 = ggplot(data = exam_data, aes(x = Anxiety)) + geom_histogram(bins = 31)

p3 = ggplot(data = exam_data, aes(x = Sleep_quality, y = Exam, fill = Gender))+
  stat_summary(fun = mean, geom = "bar", position = "dodge")

p4 = My_graph + geom_point() + geom_smooth(se = F, method = lm) +
  labs(x = "Exam scores %", y = "Anxiety scores")

combined = p1 + p2+ p3 + p4 + plot_layout(nrow = 4, byrow = F)

combined

p1 | p2 / p3 / p4

p1 | p2 / (p3 / p4)

ggsave() function, which is a versatile exporting function that can export as PostScript (.eps/.ps), tex (pictex), pdf, jpeg, tiff, png, bmp, svg and wmf (in Windows only). In its basic form, the structure of the function is very simple: ggsave(filename)

ggsave(combined, filename = here("outputs", "combined.png"), dpi=300)

Now that we learned the basics of ggplot, let’s draw some plot for our experiment data. First, we need to create a dataset with aggregated truth estimate scores over group and stage. We will use this dataset for line and bar graphs.

aggregated_data_exp1 <- data_exp1 %>%
  group_by(stage, group) %>%
  mutate(truth_estimate = mean(truth_estimate)) %>%
  ungroup()

barplot_exp1 <- aggregated_data_exp1 %>%
  ggplot(aes(x=stage, y= truth_estimate, fill=group)) +
  geom_bar(stat = "identity", position= "dodge")+
  # stat_summary(fun= mean, geom = "bar", position = "dodge")+ # can be used instead of geom_bar() for long dataframes
  labs (x= '', y= "Truth Likelihhod Estimate") + 
  theme_bw() + 
  scale_fill_jama() 

barplot_exp1

barplot_facet_exp1 <- aggregated_data_exp1 %>%
  ggplot(aes(x=group, y= truth_estimate, fill=stage)) +
  geom_bar(stat = "identity", position= "dodge")+
  labs (x= '', y= "Truth Likelihhod Estimate") + 
  theme_bw() + 
  theme(legend.position = "none",
        axis.text=element_text(size=11),
        axis.title = element_text(size = 12)) +
  facet_wrap(~stage)+
  scale_fill_jco() 

barplot_facet_exp1

lineplot_exp1 <- aggregated_data_exp1 %>%
  ggplot(aes(x=factor(stage), y= truth_estimate, group= group, color= group)) +
  geom_line(aes(linetype= group)) +
  geom_point(size= 5)+
  labs (x= '', y= "Truth Likelihhod Estimate") + 
  theme_classic() +
  theme(legend.position = "bottom",
        axis.text=element_text(size=11),
        axis.title = element_text(size = 12)) +
  scale_color_nejm() 

lineplot_exp1

violinplot_exp1 <- data_exp1 %>%
  ggplot(aes(x=factor(stage), y= truth_estimate, fill= group)) +
  geom_violin()+
  labs (x= '', y= "Truth Likelihhod Estimate") + 
  theme_bw() + 
  theme(legend.position = "bottom",
        axis.text=element_text(size=11),
        axis.title = element_text(size = 12)) +
  scale_fill_d3() 

violinplot_exp1

boxplot_exp1 <- data_exp1 %>%
  ggplot(aes(x=factor(stage), y= truth_estimate, fill= group)) +
  geom_boxplot()+
  #geom_point(position = position_dodge(width=0.75), alpha= .5)+
  labs (x= '', y= "Truth Likelihhod Estimate") + 
  theme_bw() + 
  theme(legend.position = "bottom",
        axis.text=element_text(size=11),
        axis.title = element_text(size = 12)) +
  scale_fill_simpsons() 

boxplot_exp1

boxplot_facet_exp1 <- data_exp1 %>%
  ggplot(aes(x=factor(stage), y= truth_estimate, fill= group)) +
  geom_boxplot()+
  labs (x= '', y= "Truth Likelihhod Estimate") + 
  theme_bw() + 
  theme(legend.position = "bottom",
        axis.text=element_text(size=11),
        axis.title = element_text(size = 12),
        axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  facet_wrap(~group)+
  scale_color_simpsons() 

boxplot_facet_exp1

HLet’s combine our plots:

combined_plot_exp1 <- (barplot_facet_exp1+lineplot_exp1) / (violinplot_exp1+boxplot_exp1)
combined_plot_exp1

And here, we save our plots to the outputs folder.

ggsave(combined_plot_exp1, filename = here("outputs","combined_plot_exp1.png"), dpi=300)

7 Data Analysis

7.1 t-test

Now, we use the treatment data to run three different independent t-tests. Suppose we did an experiment to compare the effectiveness of CBT vs. Psychodynamic therapies in decreasing anxiety, and depression and also in improving life satisfaction:

# t.test (indep)
t.test(anxiety~treatment, data= treatment_data)
## 
##  Welch Two Sample t-test
## 
## data:  anxiety by treatment
## t = -0.85021, df = 124.18, p-value = 0.3968
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -12.11096   4.83264
## sample estimates:
##           mean in group CBT mean in group Psychodynamic 
##                    60.54545                    64.18462
t.test(depression~treatment, data= treatment_data)
## 
##  Welch Two Sample t-test
## 
## data:  depression by treatment
## t = -2.8725, df = 123.97, p-value = 0.004792
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -18.21965  -3.35424
## sample estimates:
##           mean in group CBT mean in group Psychodynamic 
##                    47.15152                    57.93846
t.test(life_satisfaction~treatment, data= treatment_data)
## 
##  Welch Two Sample t-test
## 
## data:  life_satisfaction by treatment
## t = -5.2688, df = 127.11, p-value = 0.0000005699
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -27.61850 -12.53721
## sample estimates:
##           mean in group CBT mean in group Psychodynamic 
##                    31.06061                    51.13846

In another experiment, suppose we have created a method to boost memory. Then, we recruit some participants, do a memory pre-test, implement the method, and do a memory post-test, Now, we want to see whether our method have improved participants’ memory:

# t.test (paired)
t.test(memory_score~time, data= memory_data, paired= T)
## 
##  Paired t-test
## 
## data:  memory_score by time
## t = 5.4761, df = 130, p-value = 0.0000002163
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   7.333171 15.628661
## sample estimates:
## mean of the differences 
##                11.48092

Now that we learned about t-test, let’s perform this test on our dataset. Is there a difference between groups at the first stage? Ideally, we want participants’ ratings at the first stage be similar for both groups because we have not done any manipulations. Previous graphs showed us that ratings of simple and complex group at this stage are pretty close. Let’s test that using an independent t-test (because we have 2 independent groups):

# Is there a difference between groups at the first stage?
data_exp1 %>% 
  group_by(group) %>% 
  filter(stage=='stage1') %>% 
  ungroup () %>%
  t.test(truth_estimate~group, data = ., paired=FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  truth_estimate by group
## t = -0.75145, df = 104.95, p-value = 0.4541
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.883716   5.351781
## sample estimates:
## mean in group Complex  mean in group Simple 
##              55.44615              58.71212

Now, we wonder if opposing arguments were effective at all, regardless of participants’ group. So, we would like to test if ratings at the final stage are lower than ratings at the stage 4? Since a pair of score at stage 4 and stage 7 is coming from a same person, we use paired t-test.

# Is there a difference between ratings of stage4 and stage7?
data_exp1 %>% 
  filter(stage=='stage4' | stage=='stage7') %>% 
  ungroup () %>%
  t.test(truth_estimate~stage, data = ., paired=TRUE)
## 
##  Paired t-test
## 
## data:  truth_estimate by stage
## t = 12.788, df = 130, p-value < 0.00000000000000022
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  32.64368 44.59296
## sample estimates:
## mean of the differences 
##                38.61832

7.1.1 Exercise

John et al. (2019) investigated the consequences of backing down (changing one’s mind in lights of evidence)and how other people view someone who change their mind. In their second experiments, they presented participants either with a person who changes their mind or a person who refuses to back down. Then, they asked participants to rate how intelligent and confident the person is (See the original study here). They reported that:

“Relative to the entrepreneur who did not back down, participants judged the entrepreneur who backed down as more intelligent (M_backed_down=5.13 out of 7, SD=1.09; M_did_not_back_down=3.97, SD=1.54; t(271.12)=−7.59, p < .001) but less confident (M_backed_down=4.50 out of 7, SD=1.36; M_did_not_back_down=5.65, SD=1.10; t(291.01)=8.08, p < .001).”.

Open the john_backdown_exp2.csv file and try to reproduce their results. Run two separate independent t-test, one with intelligent as the dependent variable and one with confident as the dependent variable. For both t-test, use back_down as the between-subject independent variable.

john_data <- read_csv(here("cleaned_data","john_backdown_exp2.csv"))


t.test(intelligent~back_down, data = john_data, paired=FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  intelligent by back_down
## t = 7.5853, df = 271.12, p-value = 0.0000000000005319
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.8577107 1.4590076
## sample estimates:
##       mean in group backed_down mean in group did_not_back_down 
##                        5.129412                        3.971053
t.test(confident~back_down, data = john_data, paired=FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  confident by back_down
## t = -8.0763, df = 291.01, p-value = 0.00000000000001787
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.4257768 -0.8670294
## sample estimates:
##       mean in group backed_down mean in group did_not_back_down 
##                        4.503268                        5.649671

7.2 Analysis of Variance (ANOVA)

Now, let’s answer our main question: Do participants in the simple group show higher ratings for supportive arguments (stage 2 to 4) and lower ratings for opposing arguments (stage 5 to 7), compared to participants in the complex group? If this is the case. we expect an interaction in the traditional Analysis of Variance (AONVA) test.

aov_m1 <- aov_car (truth_estimate ~ group*stage +
                     Error(subject/stage), data = data_exp1)
Effect df MSE F ges p.value
group 1, 129 949.04 0.01 <.0001 .94
stage 4.45, 574.05 515.69 59.48 *** .25 <.0001
group:stage 4.45, 574.05 515.69 13.34 *** .07 <.0001

As you can see, we found a significant main effect of stage and a significant group by stage interaction. We can use the emmeans package to do post-hoc tests.

# main effect of stage
emmeans(aov_m1, 'stage')
##  stage  emmean   SE  df lower.CL upper.CL
##  stage1   57.1 1.88 763     53.4     60.8
##  stage2   66.8 1.88 763     63.1     70.5
##  stage3   74.6 1.88 763     70.9     78.3
##  stage4   79.6 1.88 763     75.9     83.3
##  stage5   62.4 1.88 763     58.7     66.1
##  stage6   52.5 1.88 763     48.9     56.2
##  stage7   41.1 1.88 763     37.4     44.8
## 
## Results are averaged over the levels of: group 
## Warning: EMMs are biased unless design is perfectly balanced 
## Confidence level used: 0.95
pairs(emmeans(aov_m1, 'stage'), adjust= 'holm')
##  contrast        estimate   SE  df t.ratio p.value
##  stage1 - stage2    -9.74 2.42 774 -4.031  0.0004 
##  stage1 - stage3   -17.53 2.42 774 -7.256  <.0001 
##  stage1 - stage4   -22.50 2.42 774 -9.311  <.0001 
##  stage1 - stage5    -5.29 2.42 774 -2.187  0.1160 
##  stage1 - stage6     4.53 2.42 774  1.876  0.1220 
##  stage1 - stage7    15.98 2.42 774  6.613  <.0001 
##  stage2 - stage3    -7.79 2.42 774 -3.225  0.0066 
##  stage2 - stage4   -12.76 2.42 774 -5.280  <.0001 
##  stage2 - stage5     4.46 2.42 774  1.844  0.1220 
##  stage2 - stage6    14.28 2.42 774  5.908  <.0001 
##  stage2 - stage7    25.72 2.42 774 10.644  <.0001 
##  stage3 - stage4    -4.97 2.42 774 -2.055  0.1206 
##  stage3 - stage5    12.25 2.42 774  5.069  <.0001 
##  stage3 - stage6    22.07 2.42 774  9.132  <.0001 
##  stage3 - stage7    33.51 2.42 774 13.869  <.0001 
##  stage4 - stage5    17.22 2.42 774  7.124  <.0001 
##  stage4 - stage6    27.04 2.42 774 11.188  <.0001 
##  stage4 - stage7    38.48 2.42 774 15.924  <.0001 
##  stage5 - stage6     9.82 2.42 774  4.064  0.0004 
##  stage5 - stage7    21.27 2.42 774  8.800  <.0001 
##  stage6 - stage7    11.45 2.42 774  4.736  <.0001 
## 
## Results are averaged over the levels of: group 
## P value adjustment: holm method for 21 tests
# group by stage interaction
emmeans(aov_m1, "group", by= "stage")
## stage = stage1:
##  group   emmean   SE  df lower.CL upper.CL
##  Complex   55.4 2.67 766     50.2     60.7
##  Simple    58.7 2.65 761     53.5     63.9
## 
## stage = stage2:
##  group   emmean   SE  df lower.CL upper.CL
##  Complex   63.3 2.67 766     58.1     68.6
##  Simple    70.3 2.65 761     65.1     75.5
## 
## stage = stage3:
##  group   emmean   SE  df lower.CL upper.CL
##  Complex   70.0 2.67 766     64.7     75.2
##  Simple    79.3 2.65 761     74.1     84.5
## 
## stage = stage4:
##  group   emmean   SE  df lower.CL upper.CL
##  Complex   71.6 2.67 766     66.3     76.8
##  Simple    87.6 2.65 761     82.4     92.8
## 
## stage = stage5:
##  group   emmean   SE  df lower.CL upper.CL
##  Complex   64.2 2.67 766     58.9     69.4
##  Simple    60.5 2.65 761     55.3     65.8
## 
## stage = stage6:
##  group   emmean   SE  df lower.CL upper.CL
##  Complex   57.9 2.67 766     52.7     63.2
##  Simple    47.2 2.65 761     41.9     52.4
## 
## stage = stage7:
##  group   emmean   SE  df lower.CL upper.CL
##  Complex   51.1 2.67 766     45.9     56.4
##  Simple    31.1 2.65 761     25.9     36.3
## 
## Warning: EMMs are biased unless design is perfectly balanced 
## Confidence level used: 0.95
update(pairs(emmeans(aov_m1, "group", by= "stage")), by = NULL, adjust = "holm") 
##  contrast         stage  estimate   SE  df t.ratio p.value
##  Complex - Simple stage1    -3.27 3.76 763 -0.868  0.6673 
##  Complex - Simple stage2    -6.96 3.76 763 -1.851  0.1935 
##  Complex - Simple stage3    -9.29 3.76 763 -2.469  0.0550 
##  Complex - Simple stage4   -16.02 3.76 763 -4.259  0.0001 
##  Complex - Simple stage5     3.64 3.76 763  0.967  0.6673 
##  Complex - Simple stage6    10.79 3.76 763  2.868  0.0213 
##  Complex - Simple stage7    20.08 3.76 763  5.337  <.0001 
## 
## P value adjustment: holm method for 7 tests

You can use the afex_plot function from afex to create beautiful plots. Those plots interacts nicely with ggplot:

afex_plot(aov_m1, x = "stage", trace = "group", error='between',
          line_arg = list(size=1),
          point_arg = list(size=3.5),
          data_arg = list(size= 1, color= 'grey', width=.4),
          data_geom = geom_boxplot,
          mapping = c("linetype", "shape", "fill"),
          legend_title = "Group") +
  labs(y = "Truth Likelihhod Estimate", x = "") +
  theme_bw()+ # remove the grey background and grid
  theme(axis.text=element_text(size=13),
        axis.title = element_text(size = 13),
        legend.text=element_text(size=13),
        legend.title=element_text(size=13),
        legend.position='bottom',
        legend.key.size = unit(1, "cm"),
        legend.background = element_rect(colour = 'black', fill = 'white', linetype='solid'))+
  scale_color_simpsons() +
  scale_fill_simpsons()

If you are interested in this topic, check out this nice tutorial about using afex to run ANOVA, and also this interesting tutorial on the emmeans package.

7.2.1 Exercise

Rotello et al. (2018) investigated the association between the race (White vs. Black faces) and the gun-tool judgments. In their first experiments, they presented participants with 16 White male faces and 16 Black male faces, and following that 8 images of guns and 8 images of tools. They asked participants to judge if the object is a tool or a gun by pressing keyboard buttons. Then, they ran an ANOVA to see if participants’ gun responses are higher for any of the races. So, they included prime race (Black, White) and target identity (gun, tool) as independent variables and participants’ gun responses as dependent variable into their linear model (See the original study here). They found that:

“Participants made more gun responses to guns than to tools, F(1,45) = 53243, p < 0.0001, η2g = 0.998. However, the race of the prime face did not matter, F(1,45) = 0.287, p > 0.59, η2g = 0.001, nor was there an interaction of prime race with target object, F(1,45) = 0.022, p > 0.88, η2g = 0.000)”.

Open the rotello_shooter_exp1.csv file and try to reproduce their results. Run an ANOVA (type III) with resp as the dependent variable and target, prime, and their interaction as independent variables.

# load the general data file
rotello_data <- read_csv(here("cleaned_data","rotello_shooter_exp1.csv"))

# ANOVA
rotello_aov <- aov_car (resp ~ target*prime +
           Error(subject/target*prime), data = rotello_data)
Effect df MSE F ges p.value
target 1, 45 0.00 53242.99 *** >.99 <.0001
prime 1, 45 0.00 0.29 .001 .59
target:prime 1, 45 0.00 0.02 <.0001 .88

7.3 Correlation

Here, we want to check the correlation between variables on the narcissism_data. First, we need to remove subject column because it is not numeric:

narcissism_data_cor <- narcissism_data %>%
  select(-subject)
#-- Base R:
cor(narcissism_data_cor, method = "pearson",  use = "complete.obs")

#-- Psych library:
psych::pairs.panels(narcissism_data_cor, method = "pearson", hist.col = "#00AFBB", density = T, ellipses = F, stars = T)

#-- Correlation library:
# install.packages("devtools")
# devtools::install_github("easystats/correlation")
#library("correlation")
correlation::correlation(narcissism_data_cor) %>% summary()

#-- apaTables library:
narcissism_data_cor %>% 
  apaTables::apa.cor.table(filename="./outputs/CorMatrix.doc", show.conf.interval=T)
psychopathy self_esteem narcissism mental_health
psychopathy 1.00 0.15 0.40 -0.44
self_esteem 0.15 1.00 0.11 -0.29
narcissism 0.40 0.11 1.00 -0.26
mental_health -0.44 -0.29 -0.26 1.00
Parameter mental_health narcissism self_esteem
psychopathy -0.44 0.40 0.15
self_esteem -0.29 0.11
narcissism -0.26

Now that we learned about correlation test, let’s answer to another question of this study: does persuasion and dissuasion is related to open-mindedness, cognitive ability, reasoning abilities, and thinking style? To answer this question, we need to create two indexes (scores) one for persuasion and one for dissuasion. Then we can do a correlation test:

cor_data_exp1 <- data_exp1 %>% 
  pivot_wider(names_from = stage, values_from = truth_estimate) %>%
  group_by(subject) %>%
  mutate(persuasion_index= stage2+ stage3+ stage4 - stage1,
         dissuasion_index= (101-stage5) + (101-stage6) + (101-stage7) - (101-stage4)) %>%
  ungroup()%>%
  dplyr::select(persuasion_index,dissuasion_index,openminded_total,numeracy_total,thinking_total,reasoning_total)

#---------- Base R:
cor(cor_data_exp1, method = "pearson",  use = "complete.obs")

#---------- Psych library:
cor_data_exp1 %>% 
  psych::pairs.panels(method = "pearson", hist.col = "#00AFBB", density = T, ellipses = F, stars = T)

#---------- Correlation library:
correlation::correlation(cor_data_exp1) %>% summary()

#---------- apaTables library:
cor_data_exp1 %>% 
  apaTables::apa.cor.table(filename="./outputs/CorMatrix.doc", show.conf.interval=T)
persuasion_index dissuasion_index openminded_total numeracy_total thinking_total reasoning_total
persuasion_index 1.00 0.26 0.25 0.16 0.16 0.11
dissuasion_index 0.26 1.00 -0.03 -0.03 -0.09 0.15
openminded_total 0.25 -0.03 1.00 0.40 0.26 0.11
numeracy_total 0.16 -0.03 0.40 1.00 0.44 0.15
thinking_total 0.16 -0.09 0.26 0.44 1.00 0.29
reasoning_total 0.11 0.15 0.11 0.15 0.29 1.00
Parameter reasoning_total thinking_total numeracy_total openminded_total dissuasion_index
persuasion_index 0.11 0.16 0.16 0.25 0.26
dissuasion_index 0.15 -0.09 -0.03 -0.03
openminded_total 0.11 0.26 0.40
numeracy_total 0.15 0.44
thinking_total 0.29

7.3.1 Exercise

Pennycook et al. (2020) investigated the relationship between actively open-minded thinking style about evidence (AOT-E) and different political, scientific, and religious beliefs (see the original paper here). In their first experiment, they calculated the correlation of AOTE and scientific beliefs items (global warming, evolution, etc.) and they found the following results:

Open the pennycook_aote_exp1.csv file and try to reproduce their results by creating the same correlation matrix.

pennycook_data <- read_csv(here("cleaned_data","pennycook_aote_exp1.csv")) 


#---------- Base R:
cor(pennycook_data, method = "pearson",  use = "complete.obs")

#---------- Psych library:
pennycook_data %>% 
  psych::pairs.panels(method = "pearson", hist.col = "#00AFBB", density = T, ellipses = F, stars = T)

#---------- Correlation library:
correlation::correlation(pennycook_data) %>% summary()

#---------- apaTables library:
pennycook_data %>% 
  apaTables::apa.cor.table(filename="./outputs/CorMatrix.doc", show.conf.interval=T)
Parameter trust_scien gm_health tech_problems modern_medicine old_earth vaccines stem_cell big_bang evolution global_warming
aote 0.35 0.36 0.44 0.33 0.40 0.47 0.45 0.51 0.51 0.37
global_warming 0.42 0.06 0.14 0.18 0.33 0.26 0.31 0.33 0.38
evolution 0.48 0.33 0.28 0.36 0.47 0.39 0.54 0.78
big_bang 0.49 0.37 0.28 0.36 0.45 0.37 0.54
stem_cell 0.47 0.34 0.36 0.47 0.40 0.40
vaccines 0.43 0.52 0.49 0.53 0.38
old_earth 0.29 0.24 0.21 0.33
modern_medicine 0.43 0.42 0.47
tech_problems 0.33 0.39
gm_health 0.31

7.4 Linear Regression

Here, we do single and multiple linear regreassion on the narcissism_data:

m1 <- lm(mental_health~narcissism, data= narcissism_data)
term estimate std.error statistic p.value
(Intercept) 4.86 0.56 8.75 0
narcissism -0.04 0.01 -3.04 0
m2 <- lm(mental_health~narcissism+psychopathy, data= narcissism_data)
term estimate std.error statistic p.value
(Intercept) 5.43 0.53 10.27 0.00
narcissism -0.02 0.01 -1.09 0.28
psychopathy -0.19 0.04 -4.71 0.00

Now, let’s perform regression analyses on our own dataset. In the previous section, we found that open-mindedness (AOT-E) is correlated with persuasion. Now, one may ask if open-mindedness can predict persuasion after controlling for reasoning and controlling abilities? To answer that, we can run a multiple regression analysis:

exp1_reg=lm(persuasion_index ~ openminded_total+ numeracy_total+ thinking_total+ reasoning_total,
                  data=cor_data_exp1)
term estimate std.error statistic p.value
(Intercept) 78.57 33.08 2.38 0.02
openminded_total 1.62 0.72 2.23 0.03
numeracy_total 0.72 2.11 0.34 0.73
thinking_total 3.09 4.51 0.68 0.49
reasoning_total 1.77 2.52 0.70 0.48

7.4.1 Exercise

Trémolière and Djeriouat (2020) examined the role of cognitive reflection and belief in science in climate change skepticism. In their first study, they revealed that cognitive reflection and belief in science negetively predicted climate change skepticism even after controlling for demographic and cognitive ability variables (see the original paper here).

Open the tremoliere_data_exp1.csv file and try to reproduce their results by running a multiple linear regression. Enter age, gender, education, belief in science, literacy, numeracy (Numtotal), and cognitive reflection as predictors and enter climate change skepticism (climato) as the outcome variable.

Tremoliere_data <- read_csv(here("cleaned_data","tremoliere_data_exp1.csv"))

Tremoliere_reg=lm(Climato ~ Age+ Gender+ Education+ BeliefInSciencetotal+ Literacy+ Numtotal+ CognitiveReflection,
                    data=Tremoliere_data)
term estimate std.error statistic p.value
(Intercept) 57.57 5.19 11.09 0.00
Age 0.01 0.05 0.24 0.81
Gender -5.68 1.34 -4.23 0.00
Education 0.54 0.38 1.43 0.15
BeliefInSciencetotal -0.20 0.06 -3.62 0.00
Literacy -0.49 0.51 -0.96 0.34
Numtotal -1.52 0.83 -1.82 0.07
CognitiveReflection -18.58 4.26 -4.37 0.00
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs
0.19 0.17 12.65 11.91 0 7 -1467.77 2953.54 2988.81 58235.89 364 372

8 Rmarkdown

To be completed…

9 References

  • Ghasemi, O., Handley, S., & Howarth, S. (2020). The Bright Homunculus in our Head: Individual Differences in Intuitive Sensitivity to Logical Validity.

  • John, L. K., Jeong, M., Gino, F., & Huang, L. (2019). The self-presentational consequences of upholding one’s stance in spite of the evidence. Organizational Behavior and Human Decision Processes, 154, 1-14.

  • Pennycook, G., Cheyne, J. A., Koehler, D. J., & Fugelsang, J. A. (2020). On the belief that beliefs should change according to evidence: Implications for conspiratorial, moral, paranormal, political, religious, and science beliefs. Judgment and Decision Making, 15(4), 476.

  • Rotello, C. M., Kelly, L. J., Heit, E., Vazire, S., & Vul, E. (2018). The Shape of ROC Curves in Shooter Tasks: Implications for Best Practices in Analysis. Collabra: Psychology, 4(1).

  • Trémolière, B., & Djeriouat, H. (2020). Don’t you see that its cold! Exploring the roles of cognitive reflection, climate science literacy, illusion of knowledge, and political orientation in climate change skepticism.

  • Wickham, H. (2014). Tidy data. Journal of Statistical Software, 59(10), 1-23.

---
title: "Introduction to R"
author:
  - name: "Omid Ghasemi"
    affiliation: Macquarie University
    email: omidreza.ghasemi@hdr.mq.edu.au
  - name: "Mahdi Mazidi"
    affiliation: University of Western Australia
    email: mahdi.mazidisharafabadi@research.uwa.edu.au
date: "`r format(Sys.time(), '%d %B, %Y')`"
output: 
  html_document:
    keep_md: yes
    number_sections: true
    theme: cerulean
    code_download: true
    #code_folding: hide
    toc: true
    toc_float: true
    df_print: "kable"
---

This document is the summary of the **Introduction to R** workshop. 

All correspondence related to this document should be addressed to: 

<center>
Omid Ghasemi (Macquarie University, Sydney, NSW, 2109, AUSTRALIA) 

Email: omidreza.ghasemi@hdr.mq.edu.au 
</center>



<style>

body{ /* Normal  */
      font-size: 18px;
      text-align: justify;
      line-height: 1.6;
      font-family: "Times New Roman", Times, serif;
}
code.r{ /* Code block */
    font-size: 14px;
}
pre { /* Code block - determines code spacing between lines */
    font-size: 12px;
}

</style>


```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(fig.align="center")
```



```{r libraries, message=FALSE, echo=F}
# load libraries
library(tidyverse)
library(here)
library(janitor)
library(broom)
library(afex)
library(emmeans)
library(knitr)
library(kableExtra)
library(ggsci)
library(patchwork)
library(skimr)
# install.packages("devtools")
# devtools::install_github("easystats/correlation")
library("correlation")
options(scipen=999) # turn off scientific notations
options(contrasts = c('contr.sum','contr.poly')) # set the contrast sum globally 
options(knitr.kable.NA = '')
```


# Research Question


The aim of the study is to test if simple arguments are more effective in belief revision than more complex arguments. To that end, we present participants with an imaginary scenario (two alien creatures on a planet) and a theory (one creature is predator and the other one is prey) and we ask them to rate the likelihood truth of the theory based on a simple fact (We adapted this method from Gregg et al.,2017; see the original study [here](https://journals.sagepub.com/doi/10.1080/17470218.2015.1099162)). Then, in a between-subject manipulation, participants will be presented with either 6 simple arguments (Modus Ponens conditionals) or 6 more complex arguments (Modus Tollens conditionals), and they will be asked to rate the likelihood truth of the initial theory on 7 stages. 

The first stage is the base rating stage. The next three stages include supportive arguments of the theory and the last three arguments include disproving arguments of the theory. We hypothesized that the group with simple arguments shows better persuasion (as it reflects in higher ratings for the supportive arguments) and better dissuasion (as it reflects in lower ratings for the opposing arguments).

In the last part of the study, participants will be asked to answer several cognitive capacity/style measures including thinking style (CRT), open-mindedness (AOT-E), reasoning ability (mindware), and numeracy scales. We hypothesized that cognitive ability, cognitive style, and open-mindedness are positive predictors of persuasion and dissuasion. These associations should be more pronounced for participants in the group with complex arguments because the ability and willingness to engage in deliberative thinking may favor participants to assess the underlying logical structure of those arguments. However, for participants in the simple group, the logical structure of arguments is more evident, so participants with lower ability can still assess the logical status of those arguments.
 

```{r fig.align='center', echo=FALSE}
knitr::include_graphics(here('inputs','exp_design.png'))
```

Thus, our hypotheses for this experiment are as follows:

- Participants in the group with simple arguments have higher ratings for supportive arguments (They are more easily persuaded than those in the group with complex arguments).

- Participants in the group with simple arguments have lower ratings for opposing arguments (They are more easily dissuaded than those in the group with complex arguments).

- There are significant associations between thinking style (CRT), open-mindedness (AOT-E), reasoning ability (mindware), and numeracy scales with both persuasion and dissuasion indexes in each group and in the entire sample. The relationship between these measures should be stronger, although not significantly, for participants in the group with complex arguments.


```{r echo=FALSE, out.width="550px", out.height="400px"}
knitr::include_graphics(here('inputs','prediction_plot.png'))
```


# Getting Ready

First, we need to design the experiment. For this experiment, we use online platforms for data collection. There are several options such as Gorilla, JSpsych, Qualtrics, psychoJS (pavlovia), etc. Since we do not need any reaction time data, we simply use Qualtrics. For an overview of different lab-based and online platforms, see [here](https://omidghasemi21.github.io/human_data/Scripts/behavioral_data.html). 

Next, we need to decide on the number of participants (sample size). For this study, we do not sue power analysis since we cannot access more than 120 participants. However, it is highly suggested calculate sample size using power estimation. You can find some nice tutorials on how to do that [here](https://julianquandt.com/post/power-analysis-by-data-simulation-in-r-part-i/), [here](https://nickch-k.github.io/EconometricsSlides/Week_08/Power_Simulations.html), and [here](https://cran.r-project.org/web/packages/paramtest/vignettes/Simulating-Power.html).

After we created the experiment and decided on the sample size, the next step is to preresigter the study. However, it would be better to do a pilot with 4 or 5 participants, clean all the data, do the desired analysis, and then pre-register the analysis and those codes. You can find the preregistration form for the current study [here](https://osf.io/79r6e).

Finally, we need to restructure our project in a tidy folder with different sub-folders. Having a clean and tidy folder structure can save us! There are different formats of folder structure (for example, see [here](http://nikola.me/folder_structure.html) and [here](https://slides.com/djnavarro/workflow)), but for now, we use the following structure:

```{r echo=FALSE, out.width="700px", out.height="200px"}
knitr::include_graphics(here('inputs','folder_structure.png'))
```


# Introduction to R
```{r message=FALSE, eval=F}
# load libraries
library(tidyverse)
library(here)
library(janitor)
library(broom)
library(afex)
library(emmeans)
library(knitr)
library(kableExtra)
library(ggsci)
library(patchwork)
library(skimr)
# install.packages("devtools")
# devtools::install_github("easystats/correlation")
library("correlation")
options(scipen=999) # turn off scientific notations
options(contrasts = c('contr.sum','contr.poly')) # set the contrast sum globally 
options(knitr.kable.NA = '')
```


```{r echo=FALSE, out.width="700px", out.height="700px", fig.cap= "Artwork by Allison Horst: https://github.com/allisonhorst/stats-illustrations"}
knitr::include_graphics(here('inputs','r_first_then.png'))
```


R can be used as a calculator. For mathematical purposes, be careful of the order in which R executes the commands.

```{r}
10 + 10

4 ^ 2

(250 / 500) * 100
```

R is a bit flexible with spacing (but no spacing in the name of variables and words)

```{r}
10+10

10                 +           10
```

R can sometimes tell that you're not finished yet

```{r eval=F}
10 +
```

How to create a *variable*? Variable assignment using `<-` and `=`. Note that R is case sensitive for everything

```{r}
pay <- 250

month = 12

pay * month

salary <- pay * month
```


Few points in naming variables and vectors: use short, informative words, keep same method (e.g., not using capital words, use only _ or . ).

## Function 
Function is a set of statements combined together to perform a specific task. When we use a block of code repeatedly, we can convert it to a function. To write a function, first, you need to *define* it:

```{r}
my_multiplier <- function(a,b){
  result = a * b
  return (result)
}
```

This code do nothing. To get a result, you need to *call* it:

```{r}
my_multiplier (a=2, b=4)
# or: my_multiplier (2, 4)
```

We can set a default value for our arguments:

```{r}
my_multiplier2 <- function(a,b=4){
  result = a * b
  return (result)
}

my_multiplier2 (a=2)
# or: my_multiplier (2)
# or: my_multiplier (2, 6)
```

Fortunately, you do not need to write everything from scratch. R has lots of built-in functions that you can use:
```{r}
round(54.6787)
round(54.5787, digits = 2)
```

Use `?` before the function name to get some help. For example, `?round`. You will see many functions in the rest of the workshop.

## Basic Data Types in R:

function `class()` is used to show what is the type of a variable.


1. *Logical*: `TRUE`, `FALSE` can be abbreviated as `T`, `F`.  They has to be capital, 'true' is not a logical data:
```{r}
class(TRUE)
class(F)
```

2. *Numeric*: all numbers e.g. 5,  10.5,  11,37;  a special type of numeric is "integer" which is numbers without decimal. Integers are always numeric, but numeric is not always integer:
```{r}
class(2)
class(13.46)
```

3. *Character*: text for example, "I love R" or "4" or "4.5":
```{r}
class("ha ha ha ha")
class("56.6")
class("TRUE")
```

Can we change the type of data in a variable? Yes, you need to use the function `as.---()`

```{r}
as.numeric(TRUE)
as.character(4)
as.numeric("4.5")
as.numeric("Hello")
```


## Data Structures in R


**Vector**: when there are more than one number or letter stored. Use the combine function c() for that.

```{r}
sale <- c(1, 2, 3,4, 5, 6, 7, 8, 9, 10) # also sale <- c(1:10)

sale <- c(1:10)

sale * sale
```

*Subsetting a vector*:

```{r}
days <- c("Saturday", "Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday")

days[2]
days[-2]

days[c(2, 3, 4)]
```


### Exercise

Create a vector named `my_vector` with numbers from 0 to 1000 in it:

```{r}
my_vector <- (0:1000)

mean(my_vector)
median(my_vector)
min(my_vector)
range(my_vector)
class(my_vector)
sum(my_vector)
sd(my_vector)
```

**List**: allows you to gather a variety of objects under one name (that is, the name of the list) in an ordered way. These objects can be matrices, vectors, data frames, even other list.

```{r}
my_list = list(sale, 1, 3, 4:7, "HELLO", "hello", FALSE)
my_list
```

**Factor**: Factors store the vector along with the distinct values of the elements in the vector as labels. The labels are always character irrespective of whether it is numeric or character. For example, variable gender with "male" and "female" entries:

```{r}
gender <- c("male", "male", "male", " female", "female", "female")
gender <- factor(gender)
```

R now treats gender as a nominal (categorical) variable: 1=female, 2=male internally (alphabetically).
```{r}
summary(gender)
```

*Question*: why when we ran the above function i.e. summary(), it showed three and not two levels of the data? *Hint*: run 'gender'.

```{r}
gender
```

So, be careful of spaces!

### Exercise
Create a gender factor with 30 male and 40 females (*Hint*: use the `rep()` function):
```{r}
gender <- c(rep("male",30), rep("female", 40))
gender <- factor(gender)
gender
```

There are two types of categorical variables: nominal and ordinal. How to create ordered factors (when the variable is nominal and values can be ordered)? We should add two additional arguments to the `factor()` function: `ordered = TRUE`, and `levels = c("level1", "level2")`. For example, we have a vector that shows participants' education level.

```{r}
edu<-c(3,2,3,4,1,2,2,3,4)

education<-factor(edu, ordered = TRUE)
levels(education) <- c("Primary school","high school","College","Uni graduated")
education
```

### Exercise
We have a factor with `patient` and `control` values. Here, the first level is control and the second level is patient. Change the order of levels, so patient would be the first level:

```{r}
health_status <- factor(c(rep('patient',5),rep('control',5)))
health_status

health_status_reordered <- factor(health_status, levels = c('patient','control'))
health_status_reordered
```

Finally, can you relabel both levels to uppercase characters? (*Hint*: check `?factor`)

```{r}
health_status_relabeled <- factor(health_status, levels = c('patient','control'), labels = c('Patient','Control'))
health_status_relabeled
```


**Matrices**: All columns in a matrix must have the same mode(numeric, character, etc.) and the same length. It can be created using a vector input to the matrix function.

```{r}
my_matrix = matrix(c(1,2,3,4,5,6,7,8,9), nrow = 3, ncol = 3)

my_matrix
```

**Data frames**: (two-dimensional objects) can hold numeric, character or logical values. Within a column all elements have the same data type, but different columns can be of different data type. Let's create a dataframe:

```{r}
id <- 1:200
group <- c(rep("Psychotherapy", 100), rep("Medication", 100))
response <- c(rnorm(100, mean = 30, sd = 5),
             rnorm(100, mean = 25, sd = 5))

my_dataframe <-data.frame(Patient = id,
                          Treatment = group,
                          Response = response)
```

We also could have done the below

```{r}
my_dataframe <-data.frame(Patient = c(1:200),
                          Treatment = c(rep("Psychotherapy", 100), rep("Medication", 100)),
                          Response = c(rnorm(100, mean = 30, sd = 5),
                                       rnorm(100, mean = 25, sd = 5)))
```

In large data sets, the function head() enables you to show the first observations of a data frames. Similarly, the function tail() prints out the last observations in your data set.

```{r eval=F}
head(my_dataframe) 
tail(my_dataframe)
```

```{r echo=F}
head(my_dataframe) %>%
  knitr::kable() %>%
  kable_styling(bootstrap_options = c("striped", "bordered", "condensed"), fixed_thead = T, full_width = T)
tail(my_dataframe)%>%
  knitr::kable() %>%
  kable_styling(bootstrap_options = c("striped", "bordered", "condensed"), fixed_thead = T, full_width = T)
```

Similar to vectors and matrices, brackets [] are used to selects data from rows and columns in data.frames:

```{r}
my_dataframe[35, 3]
```

### Exercise

How can we get all columns, but only for the first 10 participants?

```{r eval=F}
my_dataframe[1:10, ]
```

```{r echo=F}
knitr::kable(my_dataframe[1:10, ]) %>%
  kable_styling(bootstrap_options = c("striped", "bordered", "condensed"), fixed_thead = T, full_width = T)

```
How to get only the Response column for all participants?

```{r}
my_dataframe[ , 3]
```

Another easier way for selecting particular items is using their names that is more helpful than number of the rows in large data sets:
```{r eval=F}
my_dataframe[ , "Response"]
# OR:
my_dataframe$Response

```


# Data Cleaning

```{r echo=FALSE, out.width="700px", out.height="350px", fig.cap= "Artwork by Allison Horst: https://github.com/allisonhorst/stats-illustrations"}
knitr::include_graphics(here('inputs','environmental-data-science-r4ds-general.png'))
```


```{r echo=FALSE, out.width="700px", out.height="350px", fig.cap= "Artwork by Allison Horst: https://github.com/allisonhorst/stats-illustrations"}
knitr::include_graphics(here('inputs','cracked_setwd.png'))
```

Now, suppose we tested 141 students. First, let's read and check the uncleaned data:
```{r message=F, warning=F, eval=F}
# read the raw data
raw_data <- read_csv(here("raw_data","raw_data_exp1.csv"))
head(raw_data)
```

```{r message=F, warning=F, echo=F}
# read the raw data
raw_data <- read_csv(here("raw_data","raw_data_exp1.csv"))

knitr::kable(head(raw_data)) %>%
  kable_styling(bootstrap_options = c("striped", "bordered", "condensed"), fixed_thead = T, full_width = F)%>%
  scroll_box(width = "780px")
```

### Exercise

There is a dataset in the `cleaned_data` folder named `unicef_u5mr.csv`. Read the dataset using `read_csv` and `here`.
```{r message=F, warning=F}
unicef_data <- read_csv(here("cleaned_data","unicef_u5mr.csv"))
```

```{r echo=FALSE, out.width="700px", out.height="350px", fig.cap= "Artwork by Allison Horst: https://github.com/allisonhorst/stats-illustrations"}
knitr::include_graphics(here('inputs','tidydata_3.jpg'))
```



In order to clean the data, we use *tidyverse* which is a collection of packages to work with data. One of the tidyverse packages that we use regularly is `dplyr` which includes several functions:

- `mutate()` adds new variables or change existing ones.
- `select()` pick variables (columns) based on their names.
- `filter()` picks cases (rows) based on their values.
- `summarise()` gives a single single summary of the data (e.g., mean, counts, etc.)
- `arrange()` changes the ordering of the rows.
- `group_by()` divides your dataframe into grouped dataframes and allow you to do each of the above operations (except for `arrange`) on every one of them separately.

Pick `subject`, `age`, and `gender` columns:

```{r message=F, warning=F, eval=F}
selected_data <- select(raw_data, subject, age, gender)
```

Now, do the following tasks: pick all the male participants, pick all the male participants **or** those greater than 25 years old, and finally pick all male participants **and** those greater than 25 years old:

```{r message=F, warning=F, eval=F}
# filter all males
filtered_data_male <- filter(raw_data, gender == "Male")
# filter males and older than 25
filtered_data_male_and_greater25 <- filter(raw_data, gender == "Male" & age > 25 )
# filter males or older than 25
filtered_data_male_or_greater25 <- filter(raw_data, gender == "Male" | age > 25 )
```

Arrange (order) your dataframe based on the age, once in an ascending order (youngers first) and once based on descending order (olders first):

```{r message=F, warning=F}
# order participants based on their age
arranged_data <- arrange(raw_data, age)
# order participants based on their age (descendeing)
arranged_data_descending <- arrange(raw_data, desc(age))
```

Create a column to show if the participant has finished the task or not:

```{r message=F, warning=F}
mutated_data <- mutate (raw_data, finished= case_when(progress==100~ "Yes",T~ "No"))
```

Summarize participants age and sd:
```{r message=F, warning=F}
summarise(raw_data, mean= mean(age, na.rm=T),
          sd= sd (age, na.rm=T))
```

A new function: **pipe operators** `%>%` pipes a value into the next function:

```{r message=F, warning=F}
raw_data %>% 
  summarise(., mean= mean(age, na.rm=T),
            sd= sd (age, na.rm=T))
```


```{r message=F, warning=F}
raw_data %>% 
  summarise(mean= mean(age, na.rm=T),
            sd= sd (age, na.rm=T))
```

Calculate the age mean of younger than 25 participants only:

```{r message=F, warning=F}
raw_data %>% 
  filter (age < 25) %>%
  summarise(mean= mean(age, na.rm=T),
            sd= sd (age, na.rm=T))
```

Calculate the age mean of younger than 25 participants  for each gender separately:

```{r message=F, warning=F}
raw_data %>% 
  filter (age < 25) %>%
  group_by(gender) %>%
  summarise(mean= mean(age, na.rm=T),
            sd= sd (age, na.rm=T)) %>%
  ungroup ()
```         


### Exercise

1. Create a column to show if participant is older than 23 or not and then calculate reasoning ability (`reasoning_total`) mean for each group separately:
```{r message=F, warning=F}
raw_data %>%
  mutate(age_group = case_when(age > 23 ~ "greater than 23", T~ "younger than 23")) %>%
  group_by(age_group) %>%
  summarise(reasoning_total = mean(reasoning_total, na.rm=T))
```     

2. Add the open_mindedness total score (sum) to the dataframe and then convert subject column to factor:
```{r message=F, warning=F}
mutated_openmind_data <- raw_data %>%
  mutate(openminded_total= openminded1+openminded2+openminded3+openminded4+openminded5+openminded6+openminded7+openminded8) %>%
  mutate(subject= factor(subject))
``` 


Next, we want to pivot our data to switch between long and wide format:

```{r echo=FALSE, out.width="700px", out.height="350px", fig.cap= "Artwork by Allison Horst: https://github.com/allisonhorst/stats-illustrations"}
knitr::include_graphics(here('inputs','tidydata_1.jpg'))
```

```{r message=F, warning=F}
# pivoting your data


# Make you data long
long_data <- raw_data %>%
  select(subject, stage1_simple:stage7_simple,stage1_complex:stage7_complex) %>%
  pivot_longer(cols = c(stage1_simple:stage7_complex), names_to = 'stage', values_to = 'truth_estimate')


# Make you data wide
wide_data <- long_data %>%
  pivot_wider(names_from = stage, values_from= truth_estimate)

```

### Exercise

Convert the UNICEF dataset to long and wide formats:
```{r message=F, warning=F}
unicef_data <- read_csv(here("cleaned_data","unicef_u5mr.csv"))

library(janitor)
unicef_data_cleaned <- unicef_data %>%
  clean_names()

unicef_long_data <- unicef_data_cleaned %>% pivot_longer(cols = c(u5mr_1950:u5mr_2015), names_to = 'year', values_to = 'u5mr')
unicef_wideg_data <- unicef_long_data %>% pivot_wider(names_from = 'year', values_from = 'u5mr')
```

*Note*: The codes for the previous exercise were taken from [this blog post](https://sejdemyr.github.io/r-tutorials/basics/wide-and-long/) written by Simon Ejdemyr.

Now, let's do some cleaning using `dplyr`, `tidyr` and other `tidyverse` libraries: 
```{r message=F, warning=F, eval=F}
cleaned_data <- raw_data %>% 
  filter(progress == 100) %>% # filter out unfinished participants
  select(-end_date, -status,-ip_address, -duration_in_seconds, -recorded_date:-user_language) %>% #remove some useless columns
  mutate(openminded_total= openminded1+openminded2+openminded3+openminded4+openminded5+openminded6+openminded7+openminded8) %>%# create a total score for our questionnaire
  mutate(thinking1= case_when(thinking1==4~ 1,T~0),
         thinking2= case_when(thinking2==10~ 1,T~0),
         thinking3= case_when(thinking3==39~ 1,T~0),
         thinking_total= thinking1 + thinking2 + thinking3) %>%
  select(-thinking1:-openminded8) %>%
  pivot_longer(cols = c(stage1_simple:stage7_simple,stage1_complex:stage7_complex),names_to = 'stage',values_to = 'truth_estimate') %>% # make our dataframe long
  #pivot_wider(names_from = stage, values_from= truth_estimate) # this code change our dataframe back to wide
  filter(!is.na(truth_estimate)) %>% #remove rows with truth_estimate == NA
  mutate(stage= gsub("_.*", "", stage)) %>%
  rename(consent= consent_form) # rename a column
```


```{r message=F, warning=F, echo=F}
cleaned_data <- raw_data %>% 
  filter(progress == 100) %>% # filter out unfinished participants
  select(-end_date, -status,-ip_address, -duration_in_seconds, -recorded_date:-user_language) %>% #remove some useless columns
  mutate(openminded_total= openminded1+openminded2+openminded3+openminded4+openminded5+openminded6+openminded7+openminded8) %>%# create a total score for our questionnaire
  mutate(thinking1= case_when(thinking1==4~ 1,T~0),
         thinking2= case_when(thinking2==10~ 1,T~0),
         thinking3= case_when(thinking3==39~ 1,T~0),
         thinking_total= thinking1 + thinking2 + thinking3) %>%
  select(-thinking1:-openminded8) %>%
  pivot_longer(cols = c(stage1_simple:stage7_simple,stage1_complex:stage7_complex),names_to = 'stage',values_to = 'truth_estimate') %>% # make our dataframe long
  #pivot_wider(names_from = stage, values_from= truth_estimate) # this code change our dataframe back to wide
  filter(!is.na(truth_estimate)) %>% #remove rows with truth_estimate == NA
  mutate(stage= gsub("_.*", "", stage)) %>%
  rename(consent= consent_form)

knitr::kable(head(cleaned_data)) %>%
  kable_styling(bootstrap_options = c("striped", "bordered", "condensed"), fixed_thead = T, full_width = F)%>%
  scroll_box(width = "780px")
```

Ok, now the data is clean and tidy which means:

> 1. Each variable forms a column.
2. Each observation forms a row.
3. Each type of observational unit forms a table ([Wickham](https://vita.had.co.nz/papers/tidy-data.pdf), 2014).


Check the dataframe and all the data types:
```{r}
str(cleaned_data)
```

Finally, we save our data to the `cleaned_data` folder.

```{r}
write_csv(cleaned_data, here("cleaned_data","cleaned_data_exp1.csv"))
```

# Descriptive Statistics

```{r echo=FALSE, out.width="700px", out.height="350px", fig.cap= "Artwork by Allison Horst: https://github.com/allisonhorst/stats-illustrations"}
knitr::include_graphics(here('inputs','not_normal.png'))
```

> Note: All the data that we use here is manipulated (fabricated) for teaching purpuses. In our study, we failed to find such beautiful and interesting results.

Now, let's do some descriptive statistics. Now, we can open a new script called `data_analysis.r` and read some datasets. Then we use `skimr` package to describe our data.

```{r message=F, warning=F,}
narcissism_data <- read_csv(here("cleaned_data","narcissism_data.csv"))
narcissism_data %>% skimr::skim()
```

### Exercise

1. Open the dataset called `treatment_data.csv` in the cleaned_data folder and do a descriptive analysis:

```{r message=F, warning=F}
treatment_data <- read_csv(here("cleaned_data","treatment_data.csv"))
treatment_data %>% skimr::skim()
```

2. Do the same thing for the `memory_data.csv`.

```{r message=F, warning=F}
memory_data <- read_csv(here("cleaned_data","memory_data.csv"))
memory_data %>% group_by(time) %>%
  skimr::skim()
```

Now, let's describe our experiment data. Be careful, we need some data reshaping before description:

```{r message=F, warning=F,}
data_exp1_orig <- read_csv(here("cleaned_data","cleaned_data_exp1.csv"))


data_exp1 <- data_exp1_orig%>% 
  #mutate_if(is.character, factor) %>%
  mutate(subject= factor(subject), # convert all characters to factor
         group = factor(group),
         stage = factor(stage))
```

How many participants in total?

```{r message=F, warning=F, eval=F}
data_exp1 %>% summarise(n= n_distinct(subject))
```


```{r message=F, warning=F, echo=F}
data_exp1 %>% summarise(n= n_distinct(subject))%>%
  knitr::kable() %>%
  kable_styling(bootstrap_options = c("striped", "bordered", "condensed"), full_width = F)
```

how many participants in each group?
```{r message=F, warning=F, eval=F}
data_exp1 %>% 
  group_by(subject) %>% 
  filter(row_number()==1) %>% 
  ungroup () %>% 
  group_by(group) %>% 
  count() 
```

```{r message=F, warning=F, echo=F}
data_exp1 %>% group_by(subject) %>% filter(row_number()==1) %>% ungroup () %>% group_by(group) %>% count() %>%
  knitr::kable() %>%
  kable_styling(bootstrap_options = c("striped", "bordered", "condensed"), fixed_thead = T)
```

Find the mean and sd for numeric variables using base R `summary` function:

```{r}
data_exp1 %>% 
  group_by(subject) %>% 
  filter(row_number()==1) %>% 
  ungroup () %>%
  summary()
```

Alternatively, we can use `base R `summary` function`skimr` library:
```{r eval=F}
data_exp1 %>% 
  group_by(subject) %>% 
  filter(row_number()==1) %>% 
  ungroup () %>% 
  dplyr::select (age, numeracy_total, reasoning_total, openminded_total, thinking_total) %>% 
  skimr::skim()
```

```{r echo=F}
data_exp1 %>% 
  group_by(subject) %>% 
  filter(row_number()==1) %>% 
  ungroup () %>% 
  dplyr::select (age, numeracy_total, reasoning_total, openminded_total, thinking_total) %>% 
  skimr::skim() %>%
  knitr::kable() %>%
  kable_styling(bootstrap_options = c("striped", "bordered", "condensed"), fixed_thead = T, full_width = F)%>%
  scroll_box(width = "780px")
```


### Exercise

For this exercise, we use a dataset of one of my own studies. In this study, we asked participants to guess the physical brightness of reasoning arguments and then we gave a cognitive ability test. (See the original study [here](https://osf.io/ebxnf/)). Open `ghasemi_brightness_exp4.csv` file and answer to the following questions:

1. How many participants did we test in total?
2. Find out how many male and female we tested.
3. Calculate mean and sd for age and cognitive ability (`cog_ability`).


```{r warning=F, message=F}
ghasemi_data <- read_csv(here("cleaned_data","ghasemi_brightness_exp4.csv"))

ghasemi_data %>% summarise(n = n_distinct(participant)) # number of participants:200

ghasemi_data %>% group_by (participant) %>% filter (row_number()==1) %>% group_by (gender) %>% summarise(n= n()) %>% ungroup() # 183 female, 17 male

ghasemi_data %>% dplyr::select (age, cog_ability) %>% skimr::skim() # mean and sd for age and cognitive ability
```


# Data Visualization

```{r echo=FALSE, out.width="700px", out.height="350px", fig.cap= "Artwork by Allison Horst: https://github.com/allisonhorst/stats-illustrations"}
knitr::include_graphics(here('inputs','ggplot2_masterpiece.png'))
```


Before starting the ggplot, lets try a visualization using a function from the Base R the plot() function shows the association of each variable against the other one in a data handy for data with few number of variables to see if there are any patterns

```{r message=F, warning=F, dpi= 300, fig.height=3, fig.width=5}

exam_data<- read_csv(here::here("cleaned_data", "exam_data.csv"))

plot(x = exam_data$Anxiety, y = exam_data$Exam)

```

The code also works without writing x and y, however, writing them is strongly recommended

```{r message=F, warning=F, dpi= 300, fig.height=3, fig.width=5}

plot(exam_data$Anxiety, exam_data$Exam)
```

`ggplot`, the gg in ggplot stands for grammar of graphics. Grammar of graphics basically says any graphical representation of data, can be produced by a series of layers. You can think of a layer as a plastic transparency. Lets draw the same plot using ggplot. Always, mention the data we are going to work with.
```{r message=F, warning=F, dpi= 300, fig.height=3, fig.width=5}
ggplot(data = exam_data, aes(x = Exam, y = Anxiety))
```


- `aes`: aes which stands for aesthetics is a relationship between a variable in your dataset and an aspect of the plot that is going to visually convey the information to the reader

- Visual elements are known as geoms (short for 'geometric objects') in ggplot 2. When we define a layer, we have to tell R what geom we want displayed on that layer (do we want a bar, line dot, etc.?)

```{r message=F, warning=F, dpi= 300, fig.height=3, fig.width=5}
ggplot(data = exam_data, aes(x = Exam, y = Anxiety))+ geom_point()
```

So, lets try some of them here like shape and size. Be careful with the + sign, if you clink enter for the next part of the code, the + sign should not go to the next line

```{r message=F, warning=F, dpi= 300, fig.height=3, fig.width=5}
ggplot(data = exam_data, aes(x = Exam, y = Anxiety))+
  geom_point(size = 2, shape = 8)
```

The current plot is not very informative about the patterns for each gender.
```{r message=F, warning=F, dpi= 300, fig.height=3, fig.width=5}
ggplot(data = exam_data, aes(x = Exam, y = Anxiety, color = Gender))+
  geom_point(size = 2, shape = 10)

ggplot(data = exam_data, aes(x = Exam, y = Anxiety, color = Gender, shape = Gender))+
  geom_point(size = 2, shape = 10)
```

Question: why the above code doesn't make any change?

```{r message=F, warning=F, dpi= 300, fig.height=3, fig.width=5}
ggplot(data = exam_data, aes(x = Exam, y = Anxiety, color = Gender, shape = Gender))+
  geom_point(size = 2)
```

Can assign the first layer to a variable to reduce the length of codes for next layers.

```{r message=F, warning=F, dpi= 300, fig.height=3, fig.width=5}
My_graph <- ggplot(data = exam_data, aes(x = Exam, y = Anxiety))

My_graph + geom_point()
```

lets add a line to the current graph
```{r message=F, warning=F, dpi= 300, fig.height=3, fig.width=5}
My_graph + geom_point() + geom_smooth()
```

Aesthetics can be set for all layers of the plot (i.e., defined in the plot as a whole) or can be set individually for each geom in a plot.

```{r message=F, warning=F, dpi= 300, fig.height=3, fig.width=5}
My_graph + geom_point(aes(color = Gender)) + geom_smooth()

My_graph + geom_point(aes(color = Gender)) + geom_smooth(aes(color = Gender))
```

The shaded area around the line is the 95% confidence interval around the line. We can switch this off by  adding `se = F` (which is short for 'standard error = False')

```{r message=F, warning=F, dpi= 300, fig.height=3, fig.width=5}
My_graph + geom_point() + geom_smooth(se = F)
```


What if we want our line to be a direct line?
```{r message=F, warning=F, dpi= 300, fig.height=3, fig.width=5}
My_graph + geom_point() + geom_smooth(se = F, method = lm)

```
How to change the labels of x and y axes?
```{r message=F, warning=F, dpi= 300, fig.height=3, fig.width=5}
My_graph + geom_point() + geom_smooth(se = F, method = lm) +
  labs(x = "Exam scores %", y = "Anxiety scores")
```

Histograms are used to show distributions of variables while bar charts are used to compare variables. Histograms plot quantitative data with ranges of the data grouped into bins or intervals while bar charts plot categorical data.

```{r message=F, warning=F, dpi= 300, fig.height=3, fig.width=5}
#ggplot(data = exam_data, aes(x = Anxiety, y = Exam )) + geom_histogram()
# the code above gives an error as geom_histogram can only have x or y axis in its aes()

ggplot(data = exam_data, aes(x = Anxiety)) + geom_histogram()

ggplot(data = exam_data, aes(y = Anxiety)) + geom_histogram()

ggplot(data = exam_data, aes(x = Anxiety)) + geom_histogram(bins = 31)

ggplot(data = exam_data, aes(x = Anxiety)) + geom_histogram(bins = 31, fill = "green")

ggplot(data = exam_data, aes(x = Anxiety)) + geom_histogram(bins = 31, fill = "green", col = "red")
```

Let's stop using the My_graph variable and write the whole code from the start again for a bar chart
```{r message=F, warning=F, dpi= 300, fig.height=3, fig.width=5}
ggplot(data = exam_data, aes(x = Sleep_quality))+
  geom_bar()
```
Because we want to plot a summary of the data (the mean) rather than the raw scores themselves, we have to use a stat.
```{r message=F, warning=F, dpi= 300, fig.height=3, fig.width=5}
ggplot(data = exam_data, aes(x = Sleep_quality, y = Exam, fill = Gender))+
  geom_bar(stat = "summary", fun = "mean")


ggplot(data = exam_data, aes(x = Sleep_quality, y = Exam, fill = Gender))+
  geom_bar(stat = "summary", fun = "mean", position = "dodge")
```

The other way to get the same plot that the code above gives, is using the stat_summary function that takes the following general form: `stat_summary(function = x, geom = y)`

```{r message=F, warning=F, dpi= 300, fig.height=3, fig.width=5}
ggplot(data = exam_data, aes(x = Sleep_quality, y = Exam, fill = Gender))+
  stat_summary(fun = mean, geom = "bar", position = "dodge")
```
How to combine multiple plots? How to combine multiple plots? We can use the `patchwork` package. A nice tutorial on using this package can be found [here](https://patchwork.data-imaginist.com/articles/patchwork.html)

```{r message=F, warning=F, dpi= 300, fig.height=3, fig.width=5}
p1 = My_graph + geom_point(aes(color = Gender)) + geom_smooth()

p2 = ggplot(data = exam_data, aes(x = Anxiety)) + geom_histogram(bins = 31)

p3 = ggplot(data = exam_data, aes(x = Sleep_quality, y = Exam, fill = Gender))+
  stat_summary(fun = mean, geom = "bar", position = "dodge")

p4 = My_graph + geom_point() + geom_smooth(se = F, method = lm) +
  labs(x = "Exam scores %", y = "Anxiety scores")

combined = p1 + p2+ p3 + p4 + plot_layout(nrow = 4, byrow = F)

combined

p1 | p2 / p3 / p4

p1 | p2 / (p3 / p4)
```

`ggsave()` function, which is a versatile exporting function that can export as PostScript (.eps/.ps), tex (pictex), pdf, jpeg, tiff, png, bmp, svg and wmf (in Windows only). In its basic form, the structure of the function is very simple: `ggsave(filename)`

```{r message=F, warning=F, dpi= 300, fig.height=3, fig.width=5}
ggsave(combined, filename = here("outputs", "combined.png"), dpi=300)
```


Now that we learned the basics of ggplot, let's draw some plot for our experiment data. First, we need to create a dataset with aggregated `truth estimate` scores over `group` and `stage`. We will use this dataset for line and bar graphs.

```{r message=F, warning=F, dpi= 300, fig.height=3, fig.width=5}

aggregated_data_exp1 <- data_exp1 %>%
  group_by(stage, group) %>%
  mutate(truth_estimate = mean(truth_estimate)) %>%
  ungroup()

barplot_exp1 <- aggregated_data_exp1 %>%
  ggplot(aes(x=stage, y= truth_estimate, fill=group)) +
  geom_bar(stat = "identity", position= "dodge")+
  # stat_summary(fun= mean, geom = "bar", position = "dodge")+ # can be used instead of geom_bar() for long dataframes
  labs (x= '', y= "Truth Likelihhod Estimate") + 
  theme_bw() + 
  scale_fill_jama() 

barplot_exp1


barplot_facet_exp1 <- aggregated_data_exp1 %>%
  ggplot(aes(x=group, y= truth_estimate, fill=stage)) +
  geom_bar(stat = "identity", position= "dodge")+
  labs (x= '', y= "Truth Likelihhod Estimate") + 
  theme_bw() + 
  theme(legend.position = "none",
        axis.text=element_text(size=11),
        axis.title = element_text(size = 12)) +
  facet_wrap(~stage)+
  scale_fill_jco() 

barplot_facet_exp1


lineplot_exp1 <- aggregated_data_exp1 %>%
  ggplot(aes(x=factor(stage), y= truth_estimate, group= group, color= group)) +
  geom_line(aes(linetype= group)) +
  geom_point(size= 5)+
  labs (x= '', y= "Truth Likelihhod Estimate") + 
  theme_classic() +
  theme(legend.position = "bottom",
        axis.text=element_text(size=11),
        axis.title = element_text(size = 12)) +
  scale_color_nejm() 

lineplot_exp1


violinplot_exp1 <- data_exp1 %>%
  ggplot(aes(x=factor(stage), y= truth_estimate, fill= group)) +
  geom_violin()+
  labs (x= '', y= "Truth Likelihhod Estimate") + 
  theme_bw() + 
  theme(legend.position = "bottom",
        axis.text=element_text(size=11),
        axis.title = element_text(size = 12)) +
  scale_fill_d3() 

violinplot_exp1


boxplot_exp1 <- data_exp1 %>%
  ggplot(aes(x=factor(stage), y= truth_estimate, fill= group)) +
  geom_boxplot()+
  #geom_point(position = position_dodge(width=0.75), alpha= .5)+
  labs (x= '', y= "Truth Likelihhod Estimate") + 
  theme_bw() + 
  theme(legend.position = "bottom",
        axis.text=element_text(size=11),
        axis.title = element_text(size = 12)) +
  scale_fill_simpsons() 

boxplot_exp1


boxplot_facet_exp1 <- data_exp1 %>%
  ggplot(aes(x=factor(stage), y= truth_estimate, fill= group)) +
  geom_boxplot()+
  labs (x= '', y= "Truth Likelihhod Estimate") + 
  theme_bw() + 
  theme(legend.position = "bottom",
        axis.text=element_text(size=11),
        axis.title = element_text(size = 12),
        axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  facet_wrap(~group)+
  scale_color_simpsons() 

boxplot_facet_exp1


```

HLet's combine our plots:

```{r echo=FALSE, out.width="700px", out.height="350px", fig.cap= "Artwork by Allison Horst: https://github.com/allisonhorst/stats-illustrations"}
knitr::include_graphics(here('inputs','patchwork_1.jpg'))
```

```{r dpi= 300, fig.height=7, fig.width=9}

combined_plot_exp1 <- (barplot_facet_exp1+lineplot_exp1) / (violinplot_exp1+boxplot_exp1)
combined_plot_exp1
```

And here, we save our plots to the `outputs` folder.
```{rmessage=F}
ggsave(combined_plot_exp1, filename = here("outputs","combined_plot_exp1.png"), dpi=300)
```

# Data Analysis


## t-test

Now, we use the treatment data to run three different independent t-tests. Suppose we did an experiment to compare the effectiveness of CBT vs. Psychodynamic therapies in decreasing anxiety, and depression and also in improving life satisfaction:

```{r}
# t.test (indep)
t.test(anxiety~treatment, data= treatment_data)
t.test(depression~treatment, data= treatment_data)
t.test(life_satisfaction~treatment, data= treatment_data)
```

In another experiment, suppose we have created a method to boost memory. Then, we recruit some participants, do a memory pre-test, implement the method, and do a memory post-test, Now, we want to see whether our method have improved participants' memory: 

```{r}
# t.test (paired)
t.test(memory_score~time, data= memory_data, paired= T)
```

Now that we learned about t-test, let's perform this test on our dataset. Is there a difference between groups at the first stage? Ideally, we want participants' ratings at the first stage be similar for both groups because we have not done any manipulations. Previous graphs showed us that ratings of simple and complex group at this stage are pretty close. Let's test that using an **independent t-test** (because we have 2 independent groups):

```{r}
# Is there a difference between groups at the first stage?
data_exp1 %>% 
  group_by(group) %>% 
  filter(stage=='stage1') %>% 
  ungroup () %>%
  t.test(truth_estimate~group, data = ., paired=FALSE)
```

Now, we wonder if opposing arguments were effective at all, regardless of participants' group. So, we would like to test if ratings at the final stage are lower than ratings at the stage 4? Since a pair of score at stage 4 and stage 7 is coming from a same person, we use **paired t-test**.

```{r}
# Is there a difference between ratings of stage4 and stage7?
data_exp1 %>% 
  filter(stage=='stage4' | stage=='stage7') %>% 
  ungroup () %>%
  t.test(truth_estimate~stage, data = ., paired=TRUE)
```


### Exercise

John et al. (2019) investigated the consequences of backing down (changing one's mind in lights of evidence)and how other people view someone who change their mind. In their second experiments, they presented participants either with a person who changes their mind or a person who refuses to back down. Then, they asked participants to rate how intelligent and confident the person is (See the original study [here](https://www.hbs.edu/faculty/Publication%20Files/John%20et%20al%20-%20self-presentational%20consequences_b85b2c43-a5b5-474c-9e2c-e9853b10727e.pdf)). They reported that: 

> "Relative to the entrepreneur who did not back down, participants judged the entrepreneur who backed down as more intelligent (M_backed_down=5.13 out of 7, SD=1.09; M_did_not_back_down=3.97, SD=1.54; t(271.12)=−7.59, p < .001) but less confident (M_backed_down=4.50 out of 7, SD=1.36; M_did_not_back_down=5.65, SD=1.10; t(291.01)=8.08, p < .001).".

Open the `john_backdown_exp2.csv` file and try to reproduce their results. Run two separate independent t-test, one with `intelligent` as the dependent variable and one with `confident` as the dependent variable. For both t-test, use `back_down` as the between-subject independent variable.

```{r message=F, warning=F}
john_data <- read_csv(here("cleaned_data","john_backdown_exp2.csv"))


t.test(intelligent~back_down, data = john_data, paired=FALSE)
t.test(confident~back_down, data = john_data, paired=FALSE)
```


## Analysis of Variance (ANOVA)

Now, let's answer our main question: Do participants in the simple group show higher ratings for supportive arguments (stage 2 to 4) and lower ratings for opposing arguments (stage 5 to 7), compared to participants in the complex group? If this is the case. we expect an interaction in the traditional **Analysis of Variance (AONVA)** test.

```{r message=F, warning=F}
aov_m1 <- aov_car (truth_estimate ~ group*stage +
                     Error(subject/stage), data = data_exp1)
```

```{r echo=F}
knitr::kable(nice(aov_m1)) %>%
  kable_styling(bootstrap_options = c("striped", "bordered", "condensed"), fixed_thead = T, full_width = T)
```

As you can see, we found a significant main effect of stage and a significant group by stage interaction. We can use the `emmeans` package to do post-hoc tests.

```{r warning=F, message=F}
# main effect of stage
emmeans(aov_m1, 'stage')
pairs(emmeans(aov_m1, 'stage'), adjust= 'holm')
```


```{r warning=F, message=F}
# group by stage interaction
emmeans(aov_m1, "group", by= "stage")
update(pairs(emmeans(aov_m1, "group", by= "stage")), by = NULL, adjust = "holm") 
```

You can use the `afex_plot` function from afex to create beautiful plots. Those plots interacts nicely with ggplot:
```{r message=F, warning=F, dpi= 300}
afex_plot(aov_m1, x = "stage", trace = "group", error='between',
          line_arg = list(size=1),
          point_arg = list(size=3.5),
          data_arg = list(size= 1, color= 'grey', width=.4),
          data_geom = geom_boxplot,
          mapping = c("linetype", "shape", "fill"),
          legend_title = "Group") +
  labs(y = "Truth Likelihhod Estimate", x = "") +
  theme_bw()+ # remove the grey background and grid
  theme(axis.text=element_text(size=13),
        axis.title = element_text(size = 13),
        legend.text=element_text(size=13),
        legend.title=element_text(size=13),
        legend.position='bottom',
        legend.key.size = unit(1, "cm"),
        legend.background = element_rect(colour = 'black', fill = 'white', linetype='solid'))+
  scale_color_simpsons() +
  scale_fill_simpsons()
```


If you are interested in this topic, check out this nice tutorial about [using afex to run ANOVA](https://cran.r-project.org/web/packages/afex/vignettes/afex_anova_example.html), and also this interesting tutorial on the [emmeans package](https://aosmith.rbind.io/2019/03/25/getting-started-with-emmeans/).

### Exercise

Rotello et al. (2018) investigated the association between the race (White vs. Black faces) and the gun-tool judgments. In their first experiments, they presented participants with 16 White male faces and 16 Black male faces, and following that 8 images of guns and 8 images of tools. They asked participants to judge if the object is a tool or a gun by pressing keyboard buttons. Then, they ran an ANOVA to see if participants' gun responses are higher for any of the races. So, they included prime race (Black, White) and target identity (gun, tool) as independent variables and participants' gun responses as dependent variable into their linear model (See the original study [here](https://online.ucpress.edu/collabra/article/4/1/32/112986/The-Shape-of-ROC-Curves-in-Shooter-Tasks)). They found that: 

> "Participants made more gun responses to guns than to tools, F(1,45) = 53243, p < 0.0001, η2g = 0.998. However, the race of the prime face did not matter, F(1,45) = 0.287, p > 0.59, η2g = 0.001, nor was there an interaction of prime race with target object, F(1,45) = 0.022, p > 0.88, η2g = 0.000)".

Open the `rotello_shooter_exp1.csv` file and try to reproduce their results. Run an ANOVA (type III) with `resp` as the dependent variable and target, prime, and their interaction as independent variables.


```{r message=F, warning=F}
# load the general data file
rotello_data <- read_csv(here("cleaned_data","rotello_shooter_exp1.csv"))

# ANOVA
rotello_aov <- aov_car (resp ~ target*prime +
           Error(subject/target*prime), data = rotello_data)
```

```{r echo=F}
knitr::kable(nice(rotello_aov)) %>%
  kable_styling(bootstrap_options = c("striped", "bordered", "condensed"), fixed_thead = T, full_width = T)
```



## Correlation

Here, we want to check the correlation between variables on the `narcissism_data`. First, we need to remove `subject` column because it is not numeric:
```{r message=F, warning=F}
narcissism_data_cor <- narcissism_data %>%
  select(-subject)
```

```{r message=F, eval=F, fig.align='center', dpi=300}

#-- Base R:
cor(narcissism_data_cor, method = "pearson",  use = "complete.obs")

#-- Psych library:
psych::pairs.panels(narcissism_data_cor, method = "pearson", hist.col = "#00AFBB", density = T, ellipses = F, stars = T)

#-- Correlation library:
# install.packages("devtools")
# devtools::install_github("easystats/correlation")
#library("correlation")
correlation::correlation(narcissism_data_cor) %>% summary()

#-- apaTables library:
narcissism_data_cor %>% 
  apaTables::apa.cor.table(filename="./outputs/CorMatrix.doc", show.conf.interval=T)
```

```{r message=F, echo=F, fig.align='center', dpi=300}

#-- Base R:
cor(narcissism_data_cor, method = "pearson",  use = "complete.obs")%>%
  knitr::kable(digits = 2) %>%
  kable_styling(bootstrap_options = c("striped", "bordered", "condensed"), fixed_thead = T, full_width = F)

#-- Psych library:
psych::pairs.panels(narcissism_data_cor, method = "pearson", hist.col = "#00AFBB", density = T, ellipses = F, stars = T)

#-- Correlation library:
# install.packages("devtools")
# devtools::install_github("easystats/correlation")
#library("correlation")
correlation::correlation(narcissism_data_cor) %>% summary()%>%
  knitr::kable(digits = 2) %>%
  kable_styling(bootstrap_options = c("striped", "bordered", "condensed"), fixed_thead = T, full_width = F)

```


Now that we learned about correlation test, let's answer to another question of this study: does persuasion and dissuasion is related to open-mindedness, cognitive ability, reasoning abilities, and thinking style? To answer this question, we need to create two indexes (scores) one for persuasion and one for dissuasion. Then we can do a correlation test:

```{r message=F, eval=F, fig.align='center', dpi=300}

cor_data_exp1 <- data_exp1 %>% 
  pivot_wider(names_from = stage, values_from = truth_estimate) %>%
  group_by(subject) %>%
  mutate(persuasion_index= stage2+ stage3+ stage4 - stage1,
         dissuasion_index= (101-stage5) + (101-stage6) + (101-stage7) - (101-stage4)) %>%
  ungroup()%>%
  dplyr::select(persuasion_index,dissuasion_index,openminded_total,numeracy_total,thinking_total,reasoning_total)

#---------- Base R:
cor(cor_data_exp1, method = "pearson",  use = "complete.obs")

#---------- Psych library:
cor_data_exp1 %>% 
  psych::pairs.panels(method = "pearson", hist.col = "#00AFBB", density = T, ellipses = F, stars = T)

#---------- Correlation library:
correlation::correlation(cor_data_exp1) %>% summary()

#---------- apaTables library:
cor_data_exp1 %>% 
  apaTables::apa.cor.table(filename="./outputs/CorMatrix.doc", show.conf.interval=T)
```

```{r message=F, echo=F, fig.align='center', dpi=300}
cor_data_exp1 <- data_exp1 %>% 
  pivot_wider(names_from = stage, values_from = truth_estimate) %>%
  group_by(subject) %>%
  mutate(persuasion_index= stage2+ stage3+ stage4 - stage1,
         dissuasion_index= (101-stage5) + (101-stage6) + (101-stage7) - (101-stage4)) %>%
  ungroup()%>%
  dplyr::select(persuasion_index,dissuasion_index,openminded_total,numeracy_total,thinking_total,reasoning_total)

#---------- Base R:
cor(cor_data_exp1, method = "pearson",  use = "complete.obs")%>%
  knitr::kable(digits = 2) %>%
  kable_styling(bootstrap_options = c("striped", "bordered", "condensed"), fixed_thead = T, full_width = F)%>%
  scroll_box(width = "780px")

#---------- Psych library:
cor_data_exp1 %>% 
  psych::pairs.panels(method = "pearson", hist.col = "#00AFBB", density = T, ellipses = F, stars = T)

#---------- Correlation library:
correlation::correlation(cor_data_exp1) %>% summary()%>%
  knitr::kable(digits = 2) %>%
  kable_styling(bootstrap_options = c("striped", "bordered", "condensed"), fixed_thead = T, full_width = F)%>%
  scroll_box(width = "780px")

```


### Exercise

Pennycook et al. (2020) investigated the relationship between actively open-minded thinking style about evidence (AOT-E) and different political, scientific, and religious beliefs (see the original paper [here](https://psyarxiv.com/a7k96)). In their first experiment, they calculated the correlation of AOTE and scientific beliefs items (global warming, evolution, etc.) and they found the following results:

```{r echo=FALSE, out.width="700px", out.height="350px", fig.cap= "adapted from [Pennycook et al. (2020)](https://psyarxiv.com/a7k96)"}
knitr::include_graphics(here('inputs','pennycook_corr.png'))
```

Open the `pennycook_aote_exp1.csv` file and try to reproduce their results by creating the same correlation matrix.

```{r message=F, eval=F}
pennycook_data <- read_csv(here("cleaned_data","pennycook_aote_exp1.csv")) 


#---------- Base R:
cor(pennycook_data, method = "pearson",  use = "complete.obs")

#---------- Psych library:
pennycook_data %>% 
  psych::pairs.panels(method = "pearson", hist.col = "#00AFBB", density = T, ellipses = F, stars = T)

#---------- Correlation library:
correlation::correlation(pennycook_data) %>% summary()

#---------- apaTables library:
pennycook_data %>% 
  apaTables::apa.cor.table(filename="./outputs/CorMatrix.doc", show.conf.interval=T)
```


```{r message=F, eval=T, echo=F, fig.align='center', dpi=300}
pennycook_data <- read_csv(here("cleaned_data","pennycook_aote_exp1.csv")) %>%
  clean_names()

correlation::correlation(pennycook_data) %>% summary() %>%
  knitr::kable(digits = 2) %>%
  kable_styling(bootstrap_options = c("striped", "bordered", "condensed"), fixed_thead = T, full_width = F)%>%
  scroll_box(width = "780px")

```


## Linear Regression

Here, we do single and multiple linear regreassion on the `narcissism_data`:

```{r}
m1 <- lm(mental_health~narcissism, data= narcissism_data)
```

```{r message=F, eval=T, echo=F, fig.align='center', dpi=300}
broom::tidy(m1)%>%
  knitr::kable(digits = 2) %>%
  kable_styling(bootstrap_options = c("striped", "bordered", "condensed"), fixed_thead = T, full_width = T)
```

```{r}
m2 <- lm(mental_health~narcissism+psychopathy, data= narcissism_data)
```

```{r message=F, eval=T, echo=F, fig.align='center', dpi=300}
broom::tidy(m2)%>%
  knitr::kable(digits = 2) %>%
  kable_styling(bootstrap_options = c("striped", "bordered", "condensed"), fixed_thead = T, full_width = T)
```

Now, let's perform regression analyses on our own dataset. In the previous section, we found that open-mindedness (AOT-E) is correlated with persuasion. Now, one may ask if open-mindedness can predict persuasion after controlling for reasoning and controlling abilities? To answer that, we can run a multiple regression analysis:
```{r}
exp1_reg=lm(persuasion_index ~ openminded_total+ numeracy_total+ thinking_total+ reasoning_total,
                  data=cor_data_exp1)
```

```{r message=F, eval=T, echo=F, fig.align='center', dpi=300}
broom::tidy(exp1_reg)%>%
  knitr::kable(digits = 2) %>%
  kable_styling(bootstrap_options = c("striped", "bordered", "condensed"), fixed_thead = T, full_width = T)
```

### Exercise

Trémolière and Djeriouat (2020) examined the role of *cognitive reflection* and *belief in science* in climate change skepticism. In their first study, they revealed that cognitive reflection and belief in science negetively predicted climate change skepticism even after controlling for demographic and cognitive ability variables (see the original paper [here](https://psyarxiv.com/vp8k6/)). 

```{r echo=FALSE, out.width="700px", out.height="350px", fig.cap= "adapted from [Trémolière and Djeriouat (2020)](https://psyarxiv.com/vp8k6/)"}
knitr::include_graphics(here('inputs','tremoliere_reg.png'))
```

Open the `tremoliere_data_exp1.csv` file and try to reproduce their results by running a multiple linear regression. Enter age, gender, education, belief in science, literacy, numeracy (Numtotal), and cognitive reflection as predictors and enter climate change skepticism (climato) as the outcome variable.

```{r message=F}
Tremoliere_data <- read_csv(here("cleaned_data","tremoliere_data_exp1.csv"))

Tremoliere_reg=lm(Climato ~ Age+ Gender+ Education+ BeliefInSciencetotal+ Literacy+ Numtotal+ CognitiveReflection,
                    data=Tremoliere_data)
```


```{r message=F, eval=T, echo=F, fig.align='center', dpi=300}
broom::tidy(Tremoliere_reg)%>%
  knitr::kable(digits = 2) %>%
  kable_styling(bootstrap_options = c("striped", "bordered", "condensed"), fixed_thead = T, full_width = T)

glance(Tremoliere_reg)%>%
  knitr::kable(digits = 2) %>%
  kable_styling(bootstrap_options = c("striped", "bordered", "condensed"), fixed_thead = T, full_width = F)%>%
  scroll_box(width = "780px")
```


# Rmarkdown

To be completed...


```{r echo=FALSE, out.width="700px", out.height="350px", fig.cap= "Artwork by Allison Horst: https://github.com/allisonhorst/stats-illustrations"}
knitr::include_graphics(here('inputs','rmarkdown_wizards.png'))
```


```{r echo=FALSE, out.width="700px", out.height="350px", fig.cap= "Artwork by Allison Horst: https://github.com/allisonhorst/stats-illustrations"}
knitr::include_graphics(here('inputs','reproducibility_court.png'))
```

# References

- Ghasemi, O., Handley, S., & Howarth, S. (2020). The Bright Homunculus in our Head: Individual Differences in Intuitive Sensitivity to Logical Validity.

- John, L. K., Jeong, M., Gino, F., & Huang, L. (2019). The self-presentational consequences of upholding one’s stance in spite of the evidence. Organizational Behavior and Human Decision Processes, 154, 1-14.

- Pennycook, G., Cheyne, J. A., Koehler, D. J., & Fugelsang, J. A. (2020). On the belief that beliefs should change according to evidence: Implications for conspiratorial, moral, paranormal, political, religious, and science beliefs. Judgment and Decision Making, 15(4), 476.

- Rotello, C. M., Kelly, L. J., Heit, E., Vazire, S., & Vul, E. (2018). The Shape of ROC Curves in Shooter Tasks: Implications for Best Practices in Analysis. Collabra: Psychology, 4(1).

- Trémolière, B., & Djeriouat, H. (2020). Don’t you see that its cold! Exploring the roles of cognitive reflection, climate science literacy, illusion of knowledge, and political orientation in climate change skepticism.

- Wickham, H. (2014). Tidy data. Journal of Statistical Software, 59(10), 1-23.